Tutorial 4 shows one way to plan the design of a future experiment using EHA.

1. Load the libraries that we will be using

pkg <- c("cmdstanr", "standist", "tidyverse", "RColorBrewer", "patchwork", 
         "brms", "tidybayes", "bayesplot", "future", "parallel", "VGAM", "faux")

lapply(pkg, library, character.only = TRUE)
## This is cmdstanr version 0.8.1.9000
## - CmdStanR documentation and vignettes: mc-stan.org/cmdstanr
## - CmdStan path: /Users/Rich/.cmdstan/cmdstan-2.35.0
## - CmdStan version: 2.35.0
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
## Loading required package: Rcpp
## 
## Loading 'brms' package (version 2.21.0). Useful instructions
## can be found by typing help('brms'). A more detailed introduction
## to the package is available through vignette('brms_overview').
## 
## 
## Attaching package: 'brms'
## 
## 
## The following object is masked from 'package:stats':
## 
##     ar
## 
## 
## 
## Attaching package: 'tidybayes'
## 
## 
## The following objects are masked from 'package:brms':
## 
##     dstudent_t, pstudent_t, qstudent_t, rstudent_t
## 
## 
## This is bayesplot version 1.11.1
## 
## - Online documentation and vignettes at mc-stan.org/bayesplot
## 
## - bayesplot theme set to bayesplot::theme_default()
## 
##    * Does _not_ affect other ggplot2 plots
## 
##    * See ?bayesplot_theme_set for details on theme setting
## 
## 
## Attaching package: 'bayesplot'
## 
## 
## The following object is masked from 'package:brms':
## 
##     rhat
## 
## 
## Loading required package: stats4
## 
## Loading required package: splines
## 
## 
## Attaching package: 'VGAM'
## 
## 
## The following objects are masked from 'package:brms':
## 
##     acat, cratio, cumulative, dfrechet, dirichlet, exponential,
##     frechet, geometric, lognormal, multinomial, negbinomial, pfrechet,
##     qfrechet, rfrechet, s, sratio
## 
## 
## The following objects are masked from 'package:standist':
## 
##     erf, erfc
## 
## 
## 
## ************
## Welcome to faux. For support and examples visit:
## https://debruine.github.io/faux/
## - Get and set global package options with: faux_options()
## ************

Set options.

options(brms.backend = "cmdstanr",
        mc.cores = parallel::detectCores(),
        future.fork.enable = TRUE,
        future.rng.onMisuse = "ignore") ## automatically set in RStudio

supportsMulticore()
detectCores()
# packageVersion("cmdstanr")
# devtools::session_info("rstan")

theme settings for ggplot

theme_set(
  theme_bw() +
    theme(text = element_text(size = 22, face = "bold"), 
          title = element_text(size = 22, face = "bold"),
          legend.position = "bottom")
)

## Set the amount of dodge in figures
pd <- position_dodge(0.7)
pd2 <- position_dodge(1)

2. A brief outline of the workflow

2.1 - Main background sources

The basic approach is based upon two main sources:

The basic structure and code follows the examples outlined by Solomon Kurz in his ‘power’ blog posts and Lisa Debruine’s faux{} package.

For Solomon’s blog posts, see here: https://solomonkurz.netlify.app/tags/power/

For Lisa’s faux package, see here: https://debruine.github.io/faux/. The vignettes provided by the faux package are particularly helpful and worth following before you take a deep dive in the below code.

And for some simpler worked examples of the general workflow (i.e., simpler than using EHA), then take a look at these examples by Rich, which show how you might simulate multi-level data for a range of factorial designs: https://github.com/rich-ramsey/sim_demo

2.2 - Basic workflow

The basic workflow is as follows:

  1. Fit a regression model to an existing dataset.
  2. Use the regression model parameters to simulate a one new dataset.
  3. Create a function to iterate over many datatsets and vary parameters of interest (e.g., sample size, trial count, effect size).
  4. Summarise the simulated data to estimate likely power or precision of the research design options.

2.3 - Exceptions to the normal workflow

The basic workflow above is a modified version of that recommended by Solomon and Lisa. In a typical workflow, once the datasets are simulated, a regression model would be fit to each datatset. Then, the parameters from each model would be summarised to make a judgment about the likely precision or power of the desing. However, in the case of Model_4 in this tutorial, which took ~8 hours to build, we are not able to easily build 1000 models per variation in sample size and/or trial count, as that would take an unreasonably long time. Instead, we choose to take a simpler approach and summarise the data per bin and condition, rather than the parameter estimates from the model. Although this is not as satisfying, we feel it can still be a useful guide to set expectations. To further save time and computational resources, we also choose to use a much simpler model than Model_4, in order to outline the basic approach to planning via data simulation. The details on this simpler model structure are provided below.

3. Read in the data

Here, we again use data form Panis and Schmidt (2016), but to make things simpler, we focus on only 6 timebins (timebins 4-9) and two conditions (con and inc).

read in and wrangle the raw data

data <- read_csv("Tutorial_4_planning/data/tidyppp_Exp1_n6.csv") %>%
  ## no mask condition only, neutral prime removed, timebins 4-9 only
  filter(no_ma == 1, no_pr < 1, timebin %in% c(4:9)) %>% 
  ## remove unnecessary columns
  select(-c(bl, target_dir, corresp, resp, respac, rel_ma, irr_ma, ran_ma, no_ma)) %>%
  ## rename the error column
  rename(error = resperr) %>% 
  ## create a new variable and some factors
  mutate(prime = if_else(con_pr == 1, "con", "inc"),
         prime = factor(prime,
                        levels = c("con", "inc")),
         timebin = factor(timebin)) %>% 
  ## simplify the data by selecting a few variables only
  select(pid, timebin, prime, outcome)
## Rows: 93660 Columns: 21
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (21): pid, bl, target_dir, corresp, resp, respac, resperr, rt, trial_n, ...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(data)
## # A tibble: 6 × 4
##     pid timebin prime outcome
##   <dbl> <fct>   <fct>   <dbl>
## 1     1 4       con         0
## 2     1 5       con         0
## 3     1 6       con         0
## 4     1 7       con         0
## 5     1 8       con         0
## 6     1 9       con         0
str(data)
## tibble [7,385 × 4] (S3: tbl_df/tbl/data.frame)
##  $ pid    : num [1:7385] 1 1 1 1 1 1 1 1 1 1 ...
##  $ timebin: Factor w/ 6 levels "4","5","6","7",..: 1 2 3 4 5 6 1 2 3 4 ...
##  $ prime  : Factor w/ 2 levels "con","inc": 1 1 1 1 1 1 2 2 2 2 ...
##  $ outcome: num [1:7385] 0 0 0 0 0 0 0 0 0 0 ...

data check

data %>% 
  distinct(timebin, prime)
## # A tibble: 12 × 2
##    timebin prime
##    <fct>   <fct>
##  1 4       con  
##  2 5       con  
##  3 6       con  
##  4 7       con  
##  5 8       con  
##  6 9       con  
##  7 4       inc  
##  8 5       inc  
##  9 6       inc  
## 10 7       inc  
## 11 8       inc  
## 12 9       inc

create summary data

summary data

data_pid <- data %>% 
  group_by(pid, timebin, prime) %>% 
  summarise(outcome = mean(outcome))
## `summarise()` has grouped output by 'pid', 'timebin'. You can override using
## the `.groups` argument.
head(data_pid)
## # A tibble: 6 × 4
## # Groups:   pid, timebin [3]
##     pid timebin prime outcome
##   <dbl> <fct>   <fct>   <dbl>
## 1     1 4       con    0     
## 2     1 4       inc    0     
## 3     1 5       con    0     
## 4     1 5       inc    0     
## 5     1 6       con    0.0667
## 6     1 6       inc    0.075
data_group <- data %>% 
  group_by(timebin, prime) %>% 
  summarise(outcome = mean(outcome))
## `summarise()` has grouped output by 'timebin'. You can override using the
## `.groups` argument.
head(data_group)
## # A tibble: 6 × 3
## # Groups:   timebin [3]
##   timebin prime outcome
##   <fct>   <fct>   <dbl>
## 1 4       con   0.00975
## 2 4       inc   0.0139 
## 3 5       con   0.0478 
## 4 5       inc   0.0423 
## 5 6       con   0.0901 
## 6 6       inc   0.0913

quick plot

p3.1 <- ggplot(data_group, aes(x=timebin, y = outcome, colour = prime)) +
   geom_line(aes(group = prime)) + geom_point() +
   geom_jitter(data=data_pid, alpha = 0.5, width = 0.1, height = 0) +
   scale_y_continuous(limits = c(0,1), breaks = seq(0,1,0.25)) +
   scale_x_discrete(labels = c(1:6)) +
   scale_fill_brewer(palette = "Dark2") +
   scale_colour_brewer(palette = "Dark2") +
   labs(title = "Panis & Schmidt (2016) (N=6)",
        x = "time bins")
p3.1

check the number of trials

tally <- data %>% 
  group_by(pid, prime, timebin) %>%
  summarise(n=n(),
            sum=sum(outcome),
            .groups = "drop")
tally
## # A tibble: 72 × 5
##      pid prime timebin     n   sum
##    <dbl> <fct> <fct>   <int> <dbl>
##  1     1 con   4         120     0
##  2     1 con   5         120     0
##  3     1 con   6         120     8
##  4     1 con   7         112    19
##  5     1 con   8          93    26
##  6     1 con   9          67    28
##  7     1 inc   4         120     0
##  8     1 inc   5         120     0
##  9     1 inc   6         120     9
## 10     1 inc   7         111    12
## # ℹ 62 more rows

4. Build an initial model

In general, it can sometimes be a lot simpler to use an index coding approach to build regression models.

see this link for some background on an index coding approach: https://bookdown.org/content/4857/the-many-variables-the-spurious-waffles.html#many-categories.

and then maybe specify the interaction term like this… https://discourse.mc-stan.org/t/specifying-formulates-for-interaction-between-categorical-variables-with-the-index-coding-approach-in-brms/29449/3

And in the context of data simulation, index coding can be particularly useful, because it may be often easier and more intuitive to set condition means, rather than more complicated regression parameters, such as interaction terms.

So, the approach here is to build an initial model using an index coding approach, but with a much simpler dataset than has been used in prior tutorials.

Because we are building Bayesian regression models, which generate a posterior distribution per condition of interest in our index coding approach, we can simply calculate contrasts of interests and associated prevision or interval estimates (95% quantile intervals, for example) to address any questions we may have about comparisons between conditions, such as inc vs con in a particular timebin.

formula

with varying intercept and slope for prime by pid

formula = bf(outcome ~ 0 + timebin:prime +
               (0 + timebin:prime | pid))

check the priors available

get_prior(formula,
          data = data, family = bernoulli(link = "cloglog"))
##                 prior class              coef group resp dpar nlpar lb ub
##                (flat)     b                                              
##                (flat)     b timebin4:primecon                            
##                (flat)     b timebin4:primeinc                            
##                (flat)     b timebin5:primecon                            
##                (flat)     b timebin5:primeinc                            
##                (flat)     b timebin6:primecon                            
##                (flat)     b timebin6:primeinc                            
##                (flat)     b timebin7:primecon                            
##                (flat)     b timebin7:primeinc                            
##                (flat)     b timebin8:primecon                            
##                (flat)     b timebin8:primeinc                            
##                (flat)     b timebin9:primecon                            
##                (flat)     b timebin9:primeinc                            
##                lkj(1)   cor                                              
##                lkj(1)   cor                     pid                      
##  student_t(3, 0, 2.5)    sd                                          0   
##  student_t(3, 0, 2.5)    sd                     pid                  0   
##  student_t(3, 0, 2.5)    sd timebin4:primecon   pid                  0   
##  student_t(3, 0, 2.5)    sd timebin4:primeinc   pid                  0   
##  student_t(3, 0, 2.5)    sd timebin5:primecon   pid                  0   
##  student_t(3, 0, 2.5)    sd timebin5:primeinc   pid                  0   
##  student_t(3, 0, 2.5)    sd timebin6:primecon   pid                  0   
##  student_t(3, 0, 2.5)    sd timebin6:primeinc   pid                  0   
##  student_t(3, 0, 2.5)    sd timebin7:primecon   pid                  0   
##  student_t(3, 0, 2.5)    sd timebin7:primeinc   pid                  0   
##  student_t(3, 0, 2.5)    sd timebin8:primecon   pid                  0   
##  student_t(3, 0, 2.5)    sd timebin8:primeinc   pid                  0   
##  student_t(3, 0, 2.5)    sd timebin9:primecon   pid                  0   
##  student_t(3, 0, 2.5)    sd timebin9:primeinc   pid                  0   
##        source
##       default
##  (vectorized)
##  (vectorized)
##  (vectorized)
##  (vectorized)
##  (vectorized)
##  (vectorized)
##  (vectorized)
##  (vectorized)
##  (vectorized)
##  (vectorized)
##  (vectorized)
##  (vectorized)
##       default
##  (vectorized)
##       default
##  (vectorized)
##  (vectorized)
##  (vectorized)
##  (vectorized)
##  (vectorized)
##  (vectorized)
##  (vectorized)
##  (vectorized)
##  (vectorized)
##  (vectorized)
##  (vectorized)
##  (vectorized)
##  (vectorized)

visualise priors

For an index coding model, priors will be set differently to how they were set in the earlier tutorials, because the earlier tutorials did not use an index coding approach. Using an index coding approach, each condition has a prior for the mean of that condition. And as we know from plot 3.1 above, the condition averages across the six time bins selected range from near zero on the hazard probability scale to 0.50. And in principle, the more general case could involve any hazard value from zero to 1. Therefore, if we try to stick to a weakly informative approach to priors see here for more details, then priors for the cloglog values should cover a fairly broad range.

For a look at values across the cloglog scale and how it compares to the logit scale, see here.

Ok, so let’s take a look at some cloglog values at the lower and upper end of the probability scale.

cloglog_vals = clogloglink(c(0.01,0.99))
cloglog_vals
## [1] -4.600149  1.527180
# -4.600149  1.527180

and now take a look at some distributions

visualize("normal(0, 2)", "normal(0, 1)", "normal(0, 0.5)",
          xlim = c(-4, 4))

(0,2) looks like it covers all of our likely values, even if it is too broad.

set priors

We set the prior for the mean as (normal(0,2)) and then we use some default recommendations from McElreath 2020 for the rest.

priors <- c(
  set_prior("normal(0, 2)", class = "b"),
  set_prior("normal(0, 1)", class = "sd"),
  set_prior("lkj(2)", class = "cor")
)

run the model

This model takes ~17 minutes to build on a 2020 MacBook Pro (2 GHz Quad-Core Intel Core i5). For this tutorial, this chunk is skipped to save time. Change eval=TRUE in the code chunk to run this model.

plan(multicore)
bi <- brm(formula = formula,
        data = data, family = bernoulli(link = "cloglog"),
        prior = priors,
        iter = 2000, warmup = 1000, cores = 8, chains = 4,
        control = list(adapt_delta = 0.95),
        save_pars = save_pars(all=TRUE),
        seed = 123,
        init = 0.01,
        file = "Tutorial_4_planning/models/bi")
summary(bi)

The model built without any concerns, errors or warnings.

At this point, you would normally do the following:

  • model checking by looking at the convergence of chains and model diagnostics. We’ll skip this step for now, just to move on to the main focus of planning and data simulation.

  • Summarise and visualise the parameter estimates and/or posterior predictions. Below, we quickly summarise fixed effects from the model in cloglog values, and we also convert them to hazard values, which tend to be far easier to interpret.

visualise fixed effect parameters from model bi

read in the existing model object (if you did not build the model yourself)

bi <- readRDS("Tutorial_4_planning/models/bi.rds")
summary(bi)
##  Family: bernoulli 
##   Links: mu = cloglog 
## Formula: outcome ~ 0 + timebin:prime + (0 + timebin:prime | pid) 
##    Data: data (Number of observations: 7385) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Multilevel Hyperparameters:
## ~pid (Number of levels: 6) 
##                                          Estimate Est.Error l-95% CI u-95% CI
## sd(timebin4:primecon)                        1.06      0.53     0.15     2.24
## sd(timebin5:primecon)                        1.34      0.44     0.66     2.35
## sd(timebin6:primecon)                        0.60      0.26     0.25     1.26
## sd(timebin7:primecon)                        0.47      0.21     0.19     0.98
## sd(timebin8:primecon)                        0.58      0.25     0.24     1.24
## sd(timebin9:primecon)                        0.72      0.28     0.34     1.41
## sd(timebin4:primeinc)                        1.17      0.52     0.31     2.37
## sd(timebin5:primeinc)                        1.05      0.41     0.46     2.06
## sd(timebin6:primeinc)                        0.91      0.32     0.45     1.67
## sd(timebin7:primeinc)                        1.47      0.43     0.77     2.44
## sd(timebin8:primeinc)                        1.34      0.43     0.67     2.33
## sd(timebin9:primeinc)                        0.62      0.32     0.12     1.38
## cor(timebin4:primecon,timebin5:primecon)     0.18      0.24    -0.30     0.60
## cor(timebin4:primecon,timebin6:primecon)     0.14      0.25    -0.35     0.59
## cor(timebin5:primecon,timebin6:primecon)     0.19      0.23    -0.27     0.63
## cor(timebin4:primecon,timebin7:primecon)     0.15      0.24    -0.33     0.60
## cor(timebin5:primecon,timebin7:primecon)     0.15      0.24    -0.32     0.60
## cor(timebin6:primecon,timebin7:primecon)     0.16      0.24    -0.32     0.61
## cor(timebin4:primecon,timebin8:primecon)     0.06      0.25    -0.41     0.53
## cor(timebin5:primecon,timebin8:primecon)     0.00      0.23    -0.44     0.45
## cor(timebin6:primecon,timebin8:primecon)     0.03      0.24    -0.44     0.49
## cor(timebin7:primecon,timebin8:primecon)     0.08      0.24    -0.40     0.54
## cor(timebin4:primecon,timebin9:primecon)     0.08      0.25    -0.41     0.55
## cor(timebin5:primecon,timebin9:primecon)     0.09      0.23    -0.38     0.52
## cor(timebin6:primecon,timebin9:primecon)     0.05      0.24    -0.43     0.51
## cor(timebin7:primecon,timebin9:primecon)     0.05      0.24    -0.41     0.52
## cor(timebin8:primecon,timebin9:primecon)     0.18      0.24    -0.31     0.62
## cor(timebin4:primecon,timebin4:primeinc)     0.16      0.25    -0.34     0.61
## cor(timebin5:primecon,timebin4:primeinc)     0.22      0.23    -0.24     0.65
## cor(timebin6:primecon,timebin4:primeinc)     0.17      0.24    -0.32     0.61
## cor(timebin7:primecon,timebin4:primeinc)     0.15      0.24    -0.31     0.60
## cor(timebin8:primecon,timebin4:primeinc)     0.04      0.24    -0.42     0.49
## cor(timebin9:primecon,timebin4:primeinc)     0.08      0.24    -0.39     0.54
## cor(timebin4:primecon,timebin5:primeinc)     0.15      0.24    -0.33     0.60
## cor(timebin5:primecon,timebin5:primeinc)     0.22      0.23    -0.25     0.65
## cor(timebin6:primecon,timebin5:primeinc)     0.16      0.24    -0.31     0.61
## cor(timebin7:primecon,timebin5:primeinc)     0.16      0.25    -0.32     0.61
## cor(timebin8:primecon,timebin5:primeinc)     0.04      0.24    -0.43     0.48
## cor(timebin9:primecon,timebin5:primeinc)     0.04      0.24    -0.44     0.50
## cor(timebin4:primeinc,timebin5:primeinc)     0.17      0.24    -0.31     0.62
## cor(timebin4:primecon,timebin6:primeinc)     0.13      0.24    -0.34     0.58
## cor(timebin5:primecon,timebin6:primeinc)     0.15      0.23    -0.31     0.57
## cor(timebin6:primecon,timebin6:primeinc)     0.19      0.24    -0.29     0.64
## cor(timebin7:primecon,timebin6:primeinc)     0.18      0.24    -0.30     0.62
## cor(timebin8:primecon,timebin6:primeinc)     0.07      0.23    -0.39     0.52
## cor(timebin9:primecon,timebin6:primeinc)     0.01      0.23    -0.43     0.47
## cor(timebin4:primeinc,timebin6:primeinc)     0.15      0.23    -0.31     0.58
## cor(timebin5:primeinc,timebin6:primeinc)     0.16      0.23    -0.30     0.60
## cor(timebin4:primecon,timebin7:primeinc)     0.09      0.24    -0.39     0.53
## cor(timebin5:primecon,timebin7:primeinc)     0.08      0.22    -0.35     0.50
## cor(timebin6:primecon,timebin7:primeinc)     0.14      0.23    -0.30     0.58
## cor(timebin7:primecon,timebin7:primeinc)     0.14      0.23    -0.31     0.57
## cor(timebin8:primecon,timebin7:primeinc)     0.20      0.23    -0.25     0.62
## cor(timebin9:primecon,timebin7:primeinc)     0.11      0.23    -0.35     0.55
## cor(timebin4:primeinc,timebin7:primeinc)     0.10      0.24    -0.37     0.53
## cor(timebin5:primeinc,timebin7:primeinc)     0.11      0.23    -0.33     0.55
## cor(timebin6:primeinc,timebin7:primeinc)     0.21      0.23    -0.25     0.63
## cor(timebin4:primecon,timebin8:primeinc)     0.07      0.24    -0.40     0.54
## cor(timebin5:primecon,timebin8:primeinc)     0.03      0.22    -0.40     0.46
## cor(timebin6:primecon,timebin8:primeinc)     0.09      0.23    -0.36     0.53
## cor(timebin7:primecon,timebin8:primeinc)     0.14      0.24    -0.32     0.57
## cor(timebin8:primecon,timebin8:primeinc)     0.17      0.23    -0.31     0.59
## cor(timebin9:primecon,timebin8:primeinc)     0.01      0.23    -0.43     0.47
## cor(timebin4:primeinc,timebin8:primeinc)     0.05      0.24    -0.41     0.50
## cor(timebin5:primeinc,timebin8:primeinc)     0.11      0.23    -0.33     0.54
## cor(timebin6:primeinc,timebin8:primeinc)     0.19      0.23    -0.27     0.61
## cor(timebin7:primeinc,timebin8:primeinc)     0.25      0.22    -0.19     0.65
## cor(timebin4:primecon,timebin9:primeinc)     0.06      0.25    -0.44     0.54
## cor(timebin5:primecon,timebin9:primeinc)     0.04      0.24    -0.42     0.48
## cor(timebin6:primecon,timebin9:primeinc)     0.01      0.24    -0.46     0.50
## cor(timebin7:primecon,timebin9:primeinc)     0.04      0.24    -0.44     0.50
## cor(timebin8:primecon,timebin9:primeinc)     0.09      0.25    -0.40     0.54
## cor(timebin9:primecon,timebin9:primeinc)     0.15      0.25    -0.35     0.61
## cor(timebin4:primeinc,timebin9:primeinc)     0.05      0.25    -0.46     0.51
## cor(timebin5:primeinc,timebin9:primeinc)    -0.01      0.24    -0.47     0.47
## cor(timebin6:primeinc,timebin9:primeinc)    -0.01      0.24    -0.48     0.45
## cor(timebin7:primeinc,timebin9:primeinc)    -0.01      0.24    -0.47     0.44
## cor(timebin8:primeinc,timebin9:primeinc)    -0.04      0.24    -0.50     0.43
##                                          Rhat Bulk_ESS Tail_ESS
## sd(timebin4:primecon)                    1.00     1575      946
## sd(timebin5:primecon)                    1.00     2481     2988
## sd(timebin6:primecon)                    1.00     2305     3263
## sd(timebin7:primecon)                    1.00     2073     2563
## sd(timebin8:primecon)                    1.00     2334     2939
## sd(timebin9:primecon)                    1.00     2575     3003
## sd(timebin4:primeinc)                    1.00     2200     1735
## sd(timebin5:primeinc)                    1.00     3162     2990
## sd(timebin6:primeinc)                    1.00     2699     3051
## sd(timebin7:primeinc)                    1.00     2835     2542
## sd(timebin8:primeinc)                    1.00     3978     3135
## sd(timebin9:primeinc)                    1.00     1920     1493
## cor(timebin4:primecon,timebin5:primecon) 1.00     2579     2755
## cor(timebin4:primecon,timebin6:primecon) 1.00     3235     2896
## cor(timebin5:primecon,timebin6:primecon) 1.00     3775     3240
## cor(timebin4:primecon,timebin7:primecon) 1.00     3403     2959
## cor(timebin5:primecon,timebin7:primecon) 1.00     3891     3211
## cor(timebin6:primecon,timebin7:primecon) 1.00     3464     3294
## cor(timebin4:primecon,timebin8:primecon) 1.00     2597     2992
## cor(timebin5:primecon,timebin8:primecon) 1.00     3771     3027
## cor(timebin6:primecon,timebin8:primecon) 1.00     3425     3031
## cor(timebin7:primecon,timebin8:primecon) 1.00     3417     3089
## cor(timebin4:primecon,timebin9:primecon) 1.00     3508     3062
## cor(timebin5:primecon,timebin9:primecon) 1.00     3333     3269
## cor(timebin6:primecon,timebin9:primecon) 1.00     3878     3057
## cor(timebin7:primecon,timebin9:primecon) 1.00     3440     3278
## cor(timebin8:primecon,timebin9:primecon) 1.00     3203     2802
## cor(timebin4:primecon,timebin4:primeinc) 1.00     3562     2973
## cor(timebin5:primecon,timebin4:primeinc) 1.00     3666     2937
## cor(timebin6:primecon,timebin4:primeinc) 1.00     4437     3223
## cor(timebin7:primecon,timebin4:primeinc) 1.00     3549     3408
## cor(timebin8:primecon,timebin4:primeinc) 1.00     4029     3557
## cor(timebin9:primecon,timebin4:primeinc) 1.00     4154     3375
## cor(timebin4:primecon,timebin5:primeinc) 1.00     3651     3170
## cor(timebin5:primecon,timebin5:primeinc) 1.00     5235     3550
## cor(timebin6:primecon,timebin5:primeinc) 1.00     4083     3353
## cor(timebin7:primecon,timebin5:primeinc) 1.00     3654     3361
## cor(timebin8:primecon,timebin5:primeinc) 1.00     3569     3243
## cor(timebin9:primecon,timebin5:primeinc) 1.00     4171     3303
## cor(timebin4:primeinc,timebin5:primeinc) 1.00     3555     3378
## cor(timebin4:primecon,timebin6:primeinc) 1.00     3405     3068
## cor(timebin5:primecon,timebin6:primeinc) 1.00     4233     3473
## cor(timebin6:primecon,timebin6:primeinc) 1.00     3645     3073
## cor(timebin7:primecon,timebin6:primeinc) 1.00     3966     3441
## cor(timebin8:primecon,timebin6:primeinc) 1.00     3942     3390
## cor(timebin9:primecon,timebin6:primeinc) 1.00     3490     3451
## cor(timebin4:primeinc,timebin6:primeinc) 1.00     3394     3450
## cor(timebin5:primeinc,timebin6:primeinc) 1.00     3364     3127
## cor(timebin4:primecon,timebin7:primeinc) 1.00     2789     2925
## cor(timebin5:primecon,timebin7:primeinc) 1.00     3968     3263
## cor(timebin6:primecon,timebin7:primeinc) 1.00     3561     3246
## cor(timebin7:primecon,timebin7:primeinc) 1.00     3904     3129
## cor(timebin8:primecon,timebin7:primeinc) 1.00     4104     3483
## cor(timebin9:primecon,timebin7:primeinc) 1.00     3650     3046
## cor(timebin4:primeinc,timebin7:primeinc) 1.00     3065     3355
## cor(timebin5:primeinc,timebin7:primeinc) 1.00     3444     3085
## cor(timebin6:primeinc,timebin7:primeinc) 1.00     3049     3637
## cor(timebin4:primecon,timebin8:primeinc) 1.00     3448     3249
## cor(timebin5:primecon,timebin8:primeinc) 1.00     4224     2985
## cor(timebin6:primecon,timebin8:primeinc) 1.00     3937     3479
## cor(timebin7:primecon,timebin8:primeinc) 1.00     3616     3515
## cor(timebin8:primecon,timebin8:primeinc) 1.00     3591     3346
## cor(timebin9:primecon,timebin8:primeinc) 1.00     4010     3616
## cor(timebin4:primeinc,timebin8:primeinc) 1.00     2563     2990
## cor(timebin5:primeinc,timebin8:primeinc) 1.00     3669     3445
## cor(timebin6:primeinc,timebin8:primeinc) 1.00     3298     3284
## cor(timebin7:primeinc,timebin8:primeinc) 1.00     2613     3550
## cor(timebin4:primecon,timebin9:primeinc) 1.00     4337     3141
## cor(timebin5:primecon,timebin9:primeinc) 1.00     4579     3537
## cor(timebin6:primecon,timebin9:primeinc) 1.00     4460     3489
## cor(timebin7:primecon,timebin9:primeinc) 1.00     3474     3310
## cor(timebin8:primecon,timebin9:primeinc) 1.00     3563     3130
## cor(timebin9:primecon,timebin9:primeinc) 1.00     4193     3345
## cor(timebin4:primeinc,timebin9:primeinc) 1.00     3087     3390
## cor(timebin5:primeinc,timebin9:primeinc) 1.00     3446     3189
## cor(timebin6:primeinc,timebin9:primeinc) 1.00     3237     3300
## cor(timebin7:primeinc,timebin9:primeinc) 1.00     3336     3373
## cor(timebin8:primeinc,timebin9:primeinc) 1.00     3066     3561
## 
## Regression Coefficients:
##                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## timebin4:primecon    -4.66      0.63    -5.96    -3.44 1.00     2425     2603
## timebin5:primecon    -3.24      0.61    -4.43    -2.01 1.00     1680     1803
## timebin6:primecon    -2.32      0.30    -2.93    -1.72 1.00     2041     2286
## timebin7:primecon    -1.49      0.23    -1.96    -1.02 1.00     2030     1901
## timebin8:primecon    -1.21      0.28    -1.77    -0.63 1.00     1783     1971
## timebin9:primecon    -0.60      0.32    -1.23     0.07 1.00     1931     2216
## timebin4:primeinc    -4.37      0.61    -5.57    -3.13 1.00     2868     2386
## timebin5:primeinc    -3.21      0.50    -4.18    -2.19 1.00     2349     2460
## timebin6:primeinc    -2.37      0.41    -3.12    -1.48 1.00     2078     2205
## timebin7:primeinc    -2.27      0.64    -3.56    -0.99 1.00     2088     2403
## timebin8:primeinc    -3.04      0.59    -4.14    -1.80 1.00     2662     2729
## timebin9:primeinc    -2.45      0.32    -3.11    -1.79 1.00     3116     2732
## 
## Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

wrangle the posterior distribution

## take the posterior draws
post <- as_draws_df(bi) %>% 
  select(-lp__) %>% 
  as_tibble()

## create a summary, and use medians with robust = TRUE
post_summary <- posterior_summary(bi, robust = TRUE)

## just look at the fixed effects
post_qi_b <- post %>%
  select(starts_with("b_")) %>%
  pivot_longer(everything()) %>%
  group_by(name) %>% 
  median_qi(value)
head(post_qi_b)
## # A tibble: 6 × 7
##   name                value .lower .upper .width .point .interval
##   <chr>               <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
## 1 b_timebin4:primecon -4.63  -5.96  -3.44   0.95 median qi       
## 2 b_timebin4:primeinc -4.35  -5.57  -3.13   0.95 median qi       
## 3 b_timebin5:primecon -3.24  -4.43  -2.01   0.95 median qi       
## 4 b_timebin5:primeinc -3.21  -4.18  -2.19   0.95 median qi       
## 5 b_timebin6:primecon -2.32  -2.93  -1.72   0.95 median qi       
## 6 b_timebin6:primeinc -2.38  -3.12  -1.48   0.95 median qi

a quick plot using posterior samples and tidybayes

tidy_fixed <- post %>% 
  select(starts_with("b_"), .chain, .iteration, .draw) %>% # select and rename in simpler labels.
  rename(chain=.chain, iter=.iteration, draw=.draw) %>% 
  pivot_longer(-c(chain, draw, iter)) %>% # move from wide to long
  mutate(key = factor(name, levels=unique(name)),
         bin = rep(1:6, length.out = 48000),
         prime = rep(c("con", "inc"), each = 6, length.out = 48000),
         bin=factor(bin),
         prime=factor(prime))
head(tidy_fixed)
## # A tibble: 6 × 8
##   chain  iter  draw name                 value key                 bin   prime
##   <int> <int> <int> <chr>                <dbl> <fct>               <fct> <fct>
## 1     1     1     1 b_timebin4:primecon -6.09  b_timebin4:primecon 1     con  
## 2     1     1     1 b_timebin5:primecon -2.32  b_timebin5:primecon 2     con  
## 3     1     1     1 b_timebin6:primecon -2.78  b_timebin6:primecon 3     con  
## 4     1     1     1 b_timebin7:primecon -1.40  b_timebin7:primecon 4     con  
## 5     1     1     1 b_timebin8:primecon -1.32  b_timebin8:primecon 5     con  
## 6     1     1     1 b_timebin9:primecon -0.472 b_timebin9:primecon 6     con
tail(tidy_fixed)
## # A tibble: 6 × 8
##   chain  iter  draw name                value key                 bin   prime
##   <int> <int> <int> <chr>               <dbl> <fct>               <fct> <fct>
## 1     4  1000  4000 b_timebin4:primeinc -4.50 b_timebin4:primeinc 1     inc  
## 2     4  1000  4000 b_timebin5:primeinc -3.07 b_timebin5:primeinc 2     inc  
## 3     4  1000  4000 b_timebin6:primeinc -1.23 b_timebin6:primeinc 3     inc  
## 4     4  1000  4000 b_timebin7:primeinc -2.26 b_timebin7:primeinc 4     inc  
## 5     4  1000  4000 b_timebin8:primeinc -3.07 b_timebin8:primeinc 5     inc  
## 6     4  1000  4000 b_timebin9:primeinc -2.34 b_timebin9:primeinc 6     inc
check.labels <- tidy_fixed %>% 
  distinct(key, bin, prime) 
check.labels
## # A tibble: 12 × 3
##    key                 bin   prime
##    <fct>               <fct> <fct>
##  1 b_timebin4:primecon 1     con  
##  2 b_timebin5:primecon 2     con  
##  3 b_timebin6:primecon 3     con  
##  4 b_timebin7:primecon 4     con  
##  5 b_timebin8:primecon 5     con  
##  6 b_timebin9:primecon 6     con  
##  7 b_timebin4:primeinc 1     inc  
##  8 b_timebin5:primeinc 2     inc  
##  9 b_timebin6:primeinc 3     inc  
## 10 b_timebin7:primeinc 4     inc  
## 11 b_timebin8:primeinc 5     inc  
## 12 b_timebin9:primeinc 6     inc
# plot
p4.1 <- ggplot(tidy_fixed, aes(x = bin, y = value, fill=prime)) +  
  stat_halfeye(alpha=0.7) +
  labs(title = "Model bi cloglog",
       x = "timebin", y = "cloglog") +
  scale_fill_brewer(palette = "Dark2") 
p4.1

ggsave ("Tutorial_4_planning/figures/index_cloglog.jpeg",
        width = 6, height = 4)

calculate hazards per condition per bin in the posterior dist

tidy_haz <- tidy_fixed %>%
  select(chain, iter, draw, bin, prime, value) %>% 
  mutate(hazard = exp(value))
head(tidy_haz)
## # A tibble: 6 × 7
##   chain  iter  draw bin   prime  value  hazard
##   <int> <int> <int> <fct> <fct>  <dbl>   <dbl>
## 1     1     1     1 1     con   -6.09  0.00227
## 2     1     1     1 2     con   -2.32  0.0981 
## 3     1     1     1 3     con   -2.78  0.0619 
## 4     1     1     1 4     con   -1.40  0.247  
## 5     1     1     1 5     con   -1.32  0.266  
## 6     1     1     1 6     con   -0.472 0.624

plot hazards

p4.2 <- ggplot(tidy_haz, aes(x = bin, y = hazard,
                                    fill=prime)) +  
  stat_halfeye(alpha=0.7) +
  labs(title = "hazard estimates",
       x = "timebin", y = "hazard") +
  scale_fill_brewer(palette = "Dark2") +
  scale_y_continuous(limits = c(0,1), breaks = seq(0,1,0.25)) 
p4.2
## Warning: Removed 166 rows containing missing values or values outside the scale range
## (`stat_slabinterval()`).

ggsave ("Tutorial_4_planning/figures/index_hazard.jpeg",
        width = 6, height = 4)
## Warning: Removed 166 rows containing missing values or values outside the scale range
## (`stat_slabinterval()`).

warning about removing rows.

5. Simulate a single dataset

Note: For multi-level data, data simulation can be quite a convoluted process, and we therefore strongly recommend reading the power blogs by Solomon Kurz and working through the faux package vignettes, before getting stuck into the below code.

read in a prior model, if not already loaded.

bi <- readRDS("Tutorial_4_planning/models/bi.rds")
summary(bi)

Define some parameters

# define parameters
# specify some features of the design
subj_n = 10  # number of subjects
rep_n = 200 # number of trial repeats 
cond_n = 2 # number of conditions
bin_n = 6 # number of time bins

## set fixed effects
## as a reminder
fixef <- as_tibble(fixef(bi), rownames = "term") %>%
  mutate(across(where(is.double), \(x) round(x, 2)))
fixef
## # A tibble: 12 × 5
##    term              Estimate Est.Error  Q2.5 Q97.5
##    <chr>                <dbl>     <dbl> <dbl> <dbl>
##  1 timebin4:primecon    -4.66      0.63 -5.96 -3.44
##  2 timebin5:primecon    -3.24      0.61 -4.43 -2.01
##  3 timebin6:primecon    -2.32      0.3  -2.93 -1.72
##  4 timebin7:primecon    -1.49      0.23 -1.96 -1.02
##  5 timebin8:primecon    -1.21      0.28 -1.77 -0.63
##  6 timebin9:primecon    -0.6       0.32 -1.23  0.07
##  7 timebin4:primeinc    -4.37      0.61 -5.57 -3.13
##  8 timebin5:primeinc    -3.21      0.5  -4.18 -2.19
##  9 timebin6:primeinc    -2.37      0.41 -3.12 -1.48
## 10 timebin7:primeinc    -2.27      0.64 -3.56 -0.99
## 11 timebin8:primeinc    -3.04      0.59 -4.14 -1.8 
## 12 timebin9:primeinc    -2.45      0.32 -3.11 -1.79
## b1 to b6 = timebin 1 to 6.
## c = con
b1c = -4.66
b2c = -3.24
b3c = -2.32
b4c = -1.49
b5c = -1.21
b6c = -0.60

## i = inc
b1i = -4.37
b2i = -3.21
b3i = -2.37
b4i = -2.27
b5i = -3.04
b6i = -2.45

## set varying effects by pid
## as a reminder
varcor <- VarCorr(bi)
glimpse(varcor)
## List of 1
##  $ pid:List of 3
##   ..$ sd : num [1:12, 1:4] 1.062 1.34 0.599 0.468 0.583 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   ..$ cor: num [1:12, 1:4, 1:12] 1 0.1811 0.1402 0.1506 0.0557 ...
##   .. ..- attr(*, "dimnames")=List of 3
##   ..$ cov: num [1:12, 1:4, 1:12] 1.4119 0.2562 0.0839 0.069 0.0335 ...
##   .. ..- attr(*, "dimnames")=List of 3
str(varcor)
## List of 1
##  $ pid:List of 3
##   ..$ sd : num [1:12, 1:4] 1.062 1.34 0.599 0.468 0.583 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:12] "timebin4:primecon" "timebin5:primecon" "timebin6:primecon" "timebin7:primecon" ...
##   .. .. ..$ : chr [1:4] "Estimate" "Est.Error" "Q2.5" "Q97.5"
##   ..$ cor: num [1:12, 1:4, 1:12] 1 0.1811 0.1402 0.1506 0.0557 ...
##   .. ..- attr(*, "dimnames")=List of 3
##   .. .. ..$ : chr [1:12] "timebin4:primecon" "timebin5:primecon" "timebin6:primecon" "timebin7:primecon" ...
##   .. .. ..$ : chr [1:4] "Estimate" "Est.Error" "Q2.5" "Q97.5"
##   .. .. ..$ : chr [1:12] "timebin4:primecon" "timebin5:primecon" "timebin6:primecon" "timebin7:primecon" ...
##   ..$ cov: num [1:12, 1:4, 1:12] 1.4119 0.2562 0.0839 0.069 0.0335 ...
##   .. ..- attr(*, "dimnames")=List of 3
##   .. .. ..$ : chr [1:12] "timebin4:primecon" "timebin5:primecon" "timebin6:primecon" "timebin7:primecon" ...
##   .. .. ..$ : chr [1:4] "Estimate" "Est.Error" "Q2.5" "Q97.5"
##   .. .. ..$ : chr [1:12] "timebin4:primecon" "timebin5:primecon" "timebin6:primecon" "timebin7:primecon" ...
## extract sd
sd <- as_tibble(varcor$pid$sd, rownames = "term") %>%
  mutate(across(where(is.double), \(x) round(x, 2)))
sd
## # A tibble: 12 × 5
##    term              Estimate Est.Error  Q2.5 Q97.5
##    <chr>                <dbl>     <dbl> <dbl> <dbl>
##  1 timebin4:primecon     1.06      0.53  0.15  2.24
##  2 timebin5:primecon     1.34      0.44  0.66  2.35
##  3 timebin6:primecon     0.6       0.26  0.25  1.26
##  4 timebin7:primecon     0.47      0.21  0.19  0.98
##  5 timebin8:primecon     0.58      0.25  0.24  1.24
##  6 timebin9:primecon     0.72      0.28  0.34  1.41
##  7 timebin4:primeinc     1.17      0.52  0.31  2.37
##  8 timebin5:primeinc     1.05      0.41  0.46  2.06
##  9 timebin6:primeinc     0.91      0.32  0.45  1.67
## 10 timebin7:primeinc     1.47      0.43  0.77  2.44
## 11 timebin8:primeinc     1.34      0.43  0.67  2.33
## 12 timebin9:primeinc     0.62      0.32  0.12  1.38
u1c_sd = 1.06   # 
u2c_sd = 1.34   # 
u3c_sd = 0.60   # 
u4c_sd = 0.47   # 
u5c_sd = 0.58   #
u6c_sd = 0.72   # 

u1i_sd = 1.17   # 
u2i_sd = 1.05   # 
u3i_sd = 0.91   # 
u4i_sd = 1.47   # 
u5i_sd = 1.34   #
u6i_sd = 0.62   # 

make a correlation matrix

# varcor <- VarCorr(b3c_out)
# varcor
# str(varcor)

## take each one
sd <- varcor$pid$sd
cor <- varcor$pid$cor
cov <- varcor$pid$cov

## make it tidy
cor_mat_tidy <- as_tibble(cor, rownames = "term") %>% ## 
  rename_with(~str_replace_all(.x, '\\.', '_')) %>% 
  pivot_longer(-term,
               values_to = "value",
               names_to = c("parameter", "term2"),
               names_pattern = "([a-zA-Z_\\d?]+)_([a-zA-Z:\\d?]+)") %>% ## pivot longer and use names_pattern to select the parts of the column names to separate on. the above is regex for each bit between the separator ("-"). e.g., ([a-zA-Z]+\\d+). this looks for letters and digits before the first "_". This (\\d?) looks for the possibility of digits.
  ## the rest of this code just turns the variables into what we want
  filter(parameter == "Estimate") %>%
  select(-parameter) %>% 
  pivot_wider(names_from = "term2",
              values_from = "value") %>% 
  select(-term)
head(cor_mat_tidy)
## # A tibble: 6 × 12
##   `timebin4:primecon` `timebin5:primecon` `timebin6:primecon`
##                 <dbl>               <dbl>               <dbl>
## 1              1                  0.181                0.140 
## 2              0.181              1                    0.194 
## 3              0.140              0.194                1     
## 4              0.151              0.152                0.160 
## 5              0.0557             0.00218              0.0334
## 6              0.0784             0.0865               0.0480
## # ℹ 9 more variables: `timebin7:primecon` <dbl>, `timebin8:primecon` <dbl>,
## #   `timebin9:primecon` <dbl>, `timebin4:primeinc` <dbl>,
## #   `timebin5:primeinc` <dbl>, `timebin6:primeinc` <dbl>,
## #   `timebin7:primeinc` <dbl>, `timebin8:primeinc` <dbl>,
## #   `timebin9:primeinc` <dbl>
str(cor_mat_tidy)
## tibble [12 × 12] (S3: tbl_df/tbl/data.frame)
##  $ timebin4:primecon: num [1:12] 1 0.1811 0.1402 0.1506 0.0557 ...
##  $ timebin5:primecon: num [1:12] 0.18111 1 0.19399 0.15233 0.00218 ...
##  $ timebin6:primecon: num [1:12] 0.1402 0.194 1 0.1605 0.0334 ...
##  $ timebin7:primecon: num [1:12] 0.1506 0.1523 0.1605 1 0.0786 ...
##  $ timebin8:primecon: num [1:12] 0.05571 0.00218 0.03338 0.07859 1 ...
##  $ timebin9:primecon: num [1:12] 0.0784 0.0865 0.048 0.0548 0.181 ...
##  $ timebin4:primeinc: num [1:12] 0.157 0.2193 0.169 0.1509 0.0387 ...
##  $ timebin5:primeinc: num [1:12] 0.1534 0.2218 0.1637 0.1635 0.0405 ...
##  $ timebin6:primeinc: num [1:12] 0.1287 0.1457 0.1891 0.1826 0.0658 ...
##  $ timebin7:primeinc: num [1:12] 0.0853 0.0849 0.1443 0.143 0.2001 ...
##  $ timebin8:primeinc: num [1:12] 0.0739 0.031 0.0924 0.1406 0.1671 ...
##  $ timebin9:primeinc: num [1:12] 0.06086 0.03899 0.00569 0.04339 0.08991 ...
## check with the model summary
# summary(bi)

## remove names and make a matrix as faux{} likes it like that
## full cor mat
cor_mat <- unname(as.matrix(cor_mat_tidy))
cor_mat
##             [,1]        [,2]        [,3]       [,4]        [,5]       [,6]
##  [1,] 1.00000000 0.181111610 0.140227052 0.15064403 0.055708399 0.07844122
##  [2,] 0.18111161 1.000000000 0.193985724 0.15232674 0.002180379 0.08649338
##  [3,] 0.14022705 0.193985724 1.000000000 0.16047883 0.033383923 0.04798801
##  [4,] 0.15064403 0.152326739 0.160478829 1.00000000 0.078594351 0.05482625
##  [5,] 0.05570840 0.002180379 0.033383923 0.07859435 1.000000000 0.18103101
##  [6,] 0.07844122 0.086493375 0.047988009 0.05482625 0.181031007 1.00000000
##  [7,] 0.15700680 0.219264583 0.168973923 0.15088219 0.038736009 0.08194166
##  [8,] 0.15340135 0.221750842 0.163709105 0.16348048 0.040533384 0.03581839
##  [9,] 0.12869470 0.145655326 0.189094128 0.18261210 0.065750253 0.01304757
## [10,] 0.08533370 0.084865176 0.144296210 0.14300113 0.200118561 0.11201101
## [11,] 0.07387438 0.030988328 0.092439132 0.14063153 0.167079452 0.01372982
## [12,] 0.06086174 0.038991659 0.005693736 0.04338987 0.089912956 0.15443839
##             [,7]        [,8]        [,9]       [,10]       [,11]        [,12]
##  [1,] 0.15700680  0.15340135  0.12869470  0.08533370  0.07387438  0.060861737
##  [2,] 0.21926458  0.22175084  0.14565533  0.08486518  0.03098833  0.038991659
##  [3,] 0.16897392  0.16370910  0.18909413  0.14429621  0.09243913  0.005693736
##  [4,] 0.15088219  0.16348048  0.18261210  0.14300113  0.14063153  0.043389867
##  [5,] 0.03873601  0.04053338  0.06575025  0.20011856  0.16707945  0.089912956
##  [6,] 0.08194166  0.03581839  0.01304757  0.11201101  0.01372982  0.154438395
##  [7,] 1.00000000  0.17417672  0.15048835  0.09722597  0.04902133  0.047558257
##  [8,] 0.17417672  1.00000000  0.15883356  0.11194773  0.11333808 -0.011738558
##  [9,] 0.15048835  0.15883356  1.00000000  0.21397253  0.18646605 -0.014087671
## [10,] 0.09722597  0.11194773  0.21397253  1.00000000  0.25374893 -0.012638958
## [11,] 0.04902133  0.11333808  0.18646605  0.25374893  1.00000000 -0.039673970
## [12,] 0.04755826 -0.01173856 -0.01408767 -0.01263896 -0.03967397  1.000000000

setup the data structure

# make it reproducible
set.seed(1)

d1 <- add_random(subj = subj_n, rep = rep_n) %>%
  add_within("subj", condition = c("cond1", "cond2")) %>%
  add_contrast("condition", "treatment", add_cols = TRUE, 
               colnames = c("cond")) %>%
  add_within("rep", bin = 1:bin_n) %>%
  ### create new conditions here that respect the twelve levels b1c, b1i etc.
  mutate(con = if_else(condition == "cond1", 1, 0),
         inc = if_else(condition == "cond2", 1, 0)) %>% 
  mutate(bin1 = if_else(bin == 1, 1, 0),
         bin2 = if_else(bin == 2, 1, 0),
         bin3 = if_else(bin == 3, 1, 0),
         bin4 = if_else(bin == 4, 1, 0),
         bin5 = if_else(bin == 5, 1, 0),
         bin6 = if_else(bin == 6, 1, 0)) %>% 
  # add random effects 
  add_ranef("subj", u1c = u1c_sd, u2c = u2c_sd, u3c = u3c_sd, u4c = u4c_sd,
            u5c = u5c_sd, u6c = u6c_sd,
            u1i = u1i_sd, u2i = u2i_sd, u3i = u3i_sd, u4i = u4i_sd,
            u5i = u5i_sd, u6i = u6i_sd, 
            .cors = cor_mat) %>% 
  # calculate logit
  mutate(cloglog = (b1c + u1c) * con * bin1 + (b2c + u2c) * con * bin2 + 
           (b3c + u3c) * con * bin3 + (b4c + u4c) * con * bin4  + 
           (b5c + u5c) * con * bin5 + (b6c + u6c) * con * bin6 +
           (b1i + u1i) * inc * bin1 + (b2i + u2i) * inc * bin2 + 
           (b3i + u3i) * inc * bin3 + (b4i + u4i) * inc * bin4  + 
           (b5i + u5i) * inc * bin5 + (b6i + u6i) * inc * bin6) %>% 
  # calculate inverse cloglog
  mutate(prob = clogloglink(cloglog, inverse = TRUE)) %>%  
  # calculate event
  mutate(event = rbinom(n(), 1, prob)) 
head(d1)
## # A tibble: 6 × 28
##   subj   rep    condition  cond   bin   con   inc  bin1  bin2  bin3  bin4  bin5
##   <chr>  <chr>  <fct>     <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 subj01 rep001 cond1         0     1     1     0     1     0     0     0     0
## 2 subj01 rep001 cond1         0     2     1     0     0     1     0     0     0
## 3 subj01 rep001 cond1         0     3     1     0     0     0     1     0     0
## 4 subj01 rep001 cond1         0     4     1     0     0     0     0     1     0
## 5 subj01 rep001 cond1         0     5     1     0     0     0     0     0     1
## 6 subj01 rep001 cond1         0     6     1     0     0     0     0     0     0
## # ℹ 16 more variables: bin6 <dbl>, u1c <dbl>, u2c <dbl>, u3c <dbl>, u4c <dbl>,
## #   u5c <dbl>, u6c <dbl>, u1i <dbl>, u2i <dbl>, u3i <dbl>, u4i <dbl>,
## #   u5i <dbl>, u6i <dbl>, cloglog <dbl>, prob <dbl>, event <int>
str(d1)
## tibble [24,000 × 28] (S3: tbl_df/tbl/data.frame)
##  $ subj     : chr [1:24000] "subj01" "subj01" "subj01" "subj01" ...
##  $ rep      : chr [1:24000] "rep001" "rep001" "rep001" "rep001" ...
##  $ condition: Factor w/ 2 levels "cond1","cond2": 1 1 1 1 1 1 2 2 2 2 ...
##   ..- attr(*, "contrasts")= num [1:2, 1] 0 1
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:2] "cond1" "cond2"
##   .. .. ..$ : chr ".cond2-cond1"
##  $ cond     : num [1:24000] 0 0 0 0 0 0 1 1 1 1 ...
##  $ bin      : int [1:24000] 1 2 3 4 5 6 1 2 3 4 ...
##  $ con      : num [1:24000] 1 1 1 1 1 1 0 0 0 0 ...
##  $ inc      : num [1:24000] 0 0 0 0 0 0 1 1 1 1 ...
##  $ bin1     : num [1:24000] 1 0 0 0 0 0 1 0 0 0 ...
##  $ bin2     : num [1:24000] 0 1 0 0 0 0 0 1 0 0 ...
##  $ bin3     : num [1:24000] 0 0 1 0 0 0 0 0 1 0 ...
##  $ bin4     : num [1:24000] 0 0 0 1 0 0 0 0 0 1 ...
##  $ bin5     : num [1:24000] 0 0 0 0 1 0 0 0 0 0 ...
##  $ bin6     : num [1:24000] 0 0 0 0 0 1 0 0 0 0 ...
##  $ u1c      : num [1:24000] 0.96 0.96 0.96 0.96 0.96 ...
##  $ u2c      : num [1:24000] 1.02 1.02 1.02 1.02 1.02 ...
##  $ u3c      : num [1:24000] 0.0704 0.0704 0.0704 0.0704 0.0704 ...
##  $ u4c      : num [1:24000] 0.515 0.515 0.515 0.515 0.515 ...
##  $ u5c      : num [1:24000] -0.481 -0.481 -0.481 -0.481 -0.481 ...
##  $ u6c      : num [1:24000] -0.533 -0.533 -0.533 -0.533 -0.533 ...
##  $ u1i      : num [1:24000] 2.13 2.13 2.13 2.13 2.13 ...
##  $ u2i      : num [1:24000] 0.225 0.225 0.225 0.225 0.225 ...
##  $ u3i      : num [1:24000] 2.32 2.32 2.32 2.32 2.32 ...
##  $ u4i      : num [1:24000] -1.13 -1.13 -1.13 -1.13 -1.13 ...
##  $ u5i      : num [1:24000] 0.228 0.228 0.228 0.228 0.228 ...
##  $ u6i      : num [1:24000] 0.203 0.203 0.203 0.203 0.203 ...
##  $ cloglog  : num [1:24000] -3.7 -2.221 -2.25 -0.975 -1.691 ...
##  $ prob     : num [1:24000] 0.0244 0.1028 0.1001 0.3142 0.1683 ...
##  $ event    : int [1:24000] 0 0 1 0 0 0 0 0 1 0 ...
summary(d1)
##      subj               rep            condition          cond    
##  Length:24000       Length:24000       cond1:12000   Min.   :0.0  
##  Class :character   Class :character   cond2:12000   1st Qu.:0.0  
##  Mode  :character   Mode  :character                 Median :0.5  
##                                                      Mean   :0.5  
##                                                      3rd Qu.:1.0  
##                                                      Max.   :1.0  
##       bin           con           inc           bin1             bin2       
##  Min.   :1.0   Min.   :0.0   Min.   :0.0   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:2.0   1st Qu.:0.0   1st Qu.:0.0   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :3.5   Median :0.5   Median :0.5   Median :0.0000   Median :0.0000  
##  Mean   :3.5   Mean   :0.5   Mean   :0.5   Mean   :0.1667   Mean   :0.1667  
##  3rd Qu.:5.0   3rd Qu.:1.0   3rd Qu.:1.0   3rd Qu.:0.0000   3rd Qu.:0.0000  
##  Max.   :6.0   Max.   :1.0   Max.   :1.0   Max.   :1.0000   Max.   :1.0000  
##       bin3             bin4             bin5             bin6       
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :0.0000   Median :0.0000   Median :0.0000   Median :0.0000  
##  Mean   :0.1667   Mean   :0.1667   Mean   :0.1667   Mean   :0.1667  
##  3rd Qu.:0.0000   3rd Qu.:0.0000   3rd Qu.:0.0000   3rd Qu.:0.0000  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
##       u1c               u2c                u3c               u4c          
##  Min.   :-1.5283   Min.   :-3.42128   Min.   :-0.8625   Min.   :-0.47569  
##  1st Qu.:-0.1125   1st Qu.:-0.12861   1st Qu.:-0.2039   1st Qu.:-0.07439  
##  Median : 0.1928   Median : 0.09323   Median : 0.1110   Median : 0.10602  
##  Mean   : 0.1578   Mean   : 0.04817   Mean   : 0.1149   Mean   : 0.08341  
##  3rd Qu.: 0.9298   3rd Qu.: 1.01920   3rd Qu.: 0.6469   3rd Qu.: 0.30868  
##  Max.   : 0.9776   Max.   : 2.00570   Max.   : 0.8890   Max.   : 0.51506  
##       u5c               u6c                u1i                u2i         
##  Min.   :-0.5253   Min.   :-0.53275   Min.   :-2.34017   Min.   :-1.6646  
##  1st Qu.:-0.2885   1st Qu.:-0.13831   1st Qu.:-0.49470   1st Qu.:-0.4900  
##  Median : 0.1826   Median : 0.03428   Median : 0.04290   Median : 0.1043  
##  Mean   : 0.1632   Mean   : 0.10842   Mean   : 0.06744   Mean   :-0.1468  
##  3rd Qu.: 0.4613   3rd Qu.: 0.29160   3rd Qu.: 0.54098   3rd Qu.: 0.4603  
##  Max.   : 0.9183   Max.   : 1.11610   Max.   : 2.13459   Max.   : 0.6414  
##       u3i                u4i               u5i                 u6i          
##  Min.   :-1.50372   Min.   :-1.4329   Min.   :-2.828201   Min.   :-0.62983  
##  1st Qu.:-0.62668   1st Qu.:-1.0783   1st Qu.:-1.386799   1st Qu.:-0.34642  
##  Median : 0.06564   Median :-0.5760   Median :-0.002334   Median :-0.10261  
##  Mean   : 0.32543   Mean   :-0.2809   Mean   :-0.434283   Mean   :-0.02764  
##  3rd Qu.: 0.98246   3rd Qu.: 0.9700   3rd Qu.: 0.355877   3rd Qu.: 0.25641  
##  Max.   : 2.31517   Max.   : 1.2141   Max.   : 0.923774   Max.   : 0.92798  
##     cloglog             prob              event       
##  Min.   :-6.7102   Min.   :0.001218   Min.   :0.0000  
##  1st Qu.:-3.4118   1st Qu.:0.032449   1st Qu.:0.0000  
##  Median :-2.5715   Median :0.073575   Median :0.0000  
##  Mean   :-2.5876   Mean   :0.142520   Mean   :0.1432  
##  3rd Qu.:-1.4177   3rd Qu.:0.215182   3rd Qu.:0.0000  
##  Max.   : 0.5161   Max.   :0.812782   Max.   :1.0000
glimpse(d1)
## Rows: 24,000
## Columns: 28
## $ subj      <chr> "subj01", "subj01", "subj01", "subj01", "subj01", "subj01", …
## $ rep       <chr> "rep001", "rep001", "rep001", "rep001", "rep001", "rep001", …
## $ condition <fct> cond1, cond1, cond1, cond1, cond1, cond1, cond2, cond2, cond…
## $ cond      <dbl> 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, …
## $ bin       <int> 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, …
## $ con       <dbl> 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, …
## $ inc       <dbl> 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, …
## $ bin1      <dbl> 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, …
## $ bin2      <dbl> 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, …
## $ bin3      <dbl> 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, …
## $ bin4      <dbl> 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, …
## $ bin5      <dbl> 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, …
## $ bin6      <dbl> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, …
## $ u1c       <dbl> 0.9602266, 0.9602266, 0.9602266, 0.9602266, 0.9602266, 0.960…
## $ u2c       <dbl> 1.019196, 1.019196, 1.019196, 1.019196, 1.019196, 1.019196, …
## $ u3c       <dbl> 0.07038158, 0.07038158, 0.07038158, 0.07038158, 0.07038158, …
## $ u4c       <dbl> 0.5150636, 0.5150636, 0.5150636, 0.5150636, 0.5150636, 0.515…
## $ u5c       <dbl> -0.4810045, -0.4810045, -0.4810045, -0.4810045, -0.4810045, …
## $ u6c       <dbl> -0.532748, -0.532748, -0.532748, -0.532748, -0.532748, -0.53…
## $ u1i       <dbl> 2.134587, 2.134587, 2.134587, 2.134587, 2.134587, 2.134587, …
## $ u2i       <dbl> 0.2254692, 0.2254692, 0.2254692, 0.2254692, 0.2254692, 0.225…
## $ u3i       <dbl> 2.315169, 2.315169, 2.315169, 2.315169, 2.315169, 2.315169, …
## $ u4i       <dbl> -1.129815, -1.129815, -1.129815, -1.129815, -1.129815, -1.12…
## $ u5i       <dbl> 0.2283561, 0.2283561, 0.2283561, 0.2283561, 0.2283561, 0.228…
## $ u6i       <dbl> 0.2027101, 0.2027101, 0.2027101, 0.2027101, 0.2027101, 0.202…
## $ cloglog   <dbl> -3.69977343, -2.22080429, -2.24961842, -0.97493639, -1.69100…
## $ prob      <dbl> 0.02442587, 0.10284066, 0.10007104, 0.31423231, 0.16834223, …
## $ event     <int> 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, …
## save initial data
# write_csv(d1, "Tutorial_4_planning/data/d1.csv") #

summarise the data produced so far

d1_pid <- d1 %>%
  group_by(subj, condition, bin) %>%
  summarise(
    across(cloglog:event, \(x) mean(x, na.rm = TRUE)),
    .groups = "drop"
    )
d1_pid  
## # A tibble: 120 × 6
##    subj   condition   bin cloglog   prob event
##    <chr>  <fct>     <int>   <dbl>  <dbl> <dbl>
##  1 subj01 cond1         1 -3.70   0.0244 0.02 
##  2 subj01 cond1         2 -2.22   0.103  0.125
##  3 subj01 cond1         3 -2.25   0.100  0.065
##  4 subj01 cond1         4 -0.975  0.314  0.265
##  5 subj01 cond1         5 -1.69   0.168  0.145
##  6 subj01 cond1         6 -1.13   0.275  0.285
##  7 subj01 cond2         1 -2.24   0.101  0.125
##  8 subj01 cond2         2 -2.98   0.0493 0.045
##  9 subj01 cond2         3 -0.0548 0.612  0.595
## 10 subj01 cond2         4 -3.40   0.0328 0.035
## # ℹ 110 more rows
d1_group <- d1 %>%
  group_by(condition, bin) %>%
  summarise(
    across(cloglog:event, \(x) mean(x, na.rm = TRUE)),
    .groups = "drop"
    )
d1_group
## # A tibble: 12 × 5
##    condition   bin cloglog   prob  event
##    <fct>     <int>   <dbl>  <dbl>  <dbl>
##  1 cond1         1  -4.50  0.0136 0.013 
##  2 cond1         2  -3.19  0.0699 0.078 
##  3 cond1         3  -2.21  0.118  0.114 
##  4 cond1         4  -1.41  0.223  0.202 
##  5 cond1         5  -1.05  0.314  0.304 
##  6 cond1         6  -0.492 0.466  0.49  
##  7 cond2         1  -4.30  0.0236 0.0245
##  8 cond2         2  -3.36  0.0423 0.0455
##  9 cond2         3  -2.04  0.193  0.192 
## 10 cond2         4  -2.55  0.109  0.108 
## 11 cond2         5  -3.47  0.0489 0.053 
## 12 cond2         6  -2.48  0.0879 0.094

a quick plot

cloglog

p5.1 <- ggplot(d1_group, aes(x=bin, y = cloglog, colour = condition)) +
   geom_line(aes(group = condition)) + 
   geom_point() +
   geom_jitter(data=d1_pid, alpha = 0.5, width = 0.1, height = 0) +
   # scale_y_continuous(limits = c(0,1), breaks = seq(0,1,0.25)) +
   scale_x_continuous(breaks = 1:bin_n, labels = 1:bin_n) +
   scale_fill_brewer(palette = "Dark2") +
   scale_colour_brewer(palette = "Dark2") +
   labs(title = "cloglog - simulation d1 (N=10)",
        x = "time bins")
p5.1

event prob

p5.2 <- ggplot(d1_group, aes(x=bin, y = event, colour = condition)) +
   geom_line(aes(group = condition)) + 
   geom_point() +
   geom_jitter(data=d1_pid, alpha = 0.5, width = 0.1, height = 0) +
   scale_y_continuous(limits = c(0,1), breaks = seq(0,1,0.25)) +
   scale_x_continuous(breaks = 1:bin_n, labels = 1:bin_n) +
   scale_fill_brewer(palette = "Dark2") +
   scale_colour_brewer(palette = "Dark2") +
   labs(title = "event prob - simulation d1 (N=10)",
        x = "time bins")
p5.2

convert the data into outcome data (this next chunk can be incorporated into the data structure chunk above, I’m just doing separately here, so that you can see what is initially generated)

d1_out <- d1 %>% 
  group_by(subj, rep, condition) %>%
  mutate(cumsum = cumsum(event),
         outcomel = event==1 & cumsum(event) < 2,
         outcomen = if_else(event==1 & cumsum(event) == 1, 1,
                     if_else(cumsum(event) >= 1, NA, 0))) %>% 
  drop_na(outcomen) %>% 
  ungroup()
head(d1_out)
## # A tibble: 6 × 31
##   subj   rep    condition  cond   bin   con   inc  bin1  bin2  bin3  bin4  bin5
##   <chr>  <chr>  <fct>     <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 subj01 rep001 cond1         0     1     1     0     1     0     0     0     0
## 2 subj01 rep001 cond1         0     2     1     0     0     1     0     0     0
## 3 subj01 rep001 cond1         0     3     1     0     0     0     1     0     0
## 4 subj01 rep001 cond2         1     1     0     1     1     0     0     0     0
## 5 subj01 rep001 cond2         1     2     0     1     0     1     0     0     0
## 6 subj01 rep001 cond2         1     3     0     1     0     0     1     0     0
## # ℹ 19 more variables: bin6 <dbl>, u1c <dbl>, u2c <dbl>, u3c <dbl>, u4c <dbl>,
## #   u5c <dbl>, u6c <dbl>, u1i <dbl>, u2i <dbl>, u3i <dbl>, u4i <dbl>,
## #   u5i <dbl>, u6i <dbl>, cloglog <dbl>, prob <dbl>, event <int>, cumsum <int>,
## #   outcomel <lgl>, outcomen <dbl>
## save outcome data
# write_csv(d1_out, "Tutorial_4_planning/data/d1_out.csv")

summarise the outcome data, which is produced based on the prob value, right?

d1_out_pid <- d1_out %>% 
  group_by(subj, condition, bin) %>% 
  summarise(outcome = mean(outcomen)) %>% 
  mutate(prime = if_else(condition == "cond1", "con", "inc"),
         prime = factor(prime))
## `summarise()` has grouped output by 'subj', 'condition'. You can override using
## the `.groups` argument.
d1_out_pid
## # A tibble: 120 × 5
## # Groups:   subj, condition [20]
##    subj   condition   bin outcome prime
##    <chr>  <fct>     <int>   <dbl> <fct>
##  1 subj01 cond1         1  0.02   con  
##  2 subj01 cond1         2  0.122  con  
##  3 subj01 cond1         3  0.0698 con  
##  4 subj01 cond1         4  0.269  con  
##  5 subj01 cond1         5  0.145  con  
##  6 subj01 cond1         6  0.3    con  
##  7 subj01 cond2         1  0.125  inc  
##  8 subj01 cond2         2  0.0457 inc  
##  9 subj01 cond2         3  0.623  inc  
## 10 subj01 cond2         4  0.0317 inc  
## # ℹ 110 more rows
d1_out_group <- d1_out %>% 
  group_by(condition, bin) %>% 
  summarise(outcome = mean(outcomen)) %>% 
  mutate(prime = if_else(condition == "cond1", "con", "inc"),
         prime = factor(prime))
## `summarise()` has grouped output by 'condition'. You can override using the
## `.groups` argument.
d1_out_group
## # A tibble: 12 × 4
## # Groups:   condition [2]
##    condition   bin outcome prime
##    <fct>     <int>   <dbl> <fct>
##  1 cond1         1  0.013  con  
##  2 cond1         2  0.0775 con  
##  3 cond1         3  0.112  con  
##  4 cond1         4  0.202  con  
##  5 cond1         5  0.318  con  
##  6 cond1         6  0.492  con  
##  7 cond2         1  0.0245 inc  
##  8 cond2         2  0.0461 inc  
##  9 cond2         3  0.190  inc  
## 10 cond2         4  0.109  inc  
## 11 cond2         5  0.0528 inc  
## 12 cond2         6  0.0951 inc

plot outcome using the prime variable to make it comparable to the raw data

p5.3 <- ggplot(d1_out_group, aes(x = bin, y = outcome, colour = prime)) +
  geom_line(aes(group = prime)) + 
  geom_point() +
  geom_jitter(data=d1_out_pid, alpha=0.5, width = 0.1, height = 0) +
  scale_y_continuous(limits = c(0,1), breaks = seq(0,1,0.25)) +
  scale_x_continuous(breaks = 1:bin_n, labels = 1:bin_n) +
  scale_colour_brewer(palette = "Dark2") +
  labs(title = "simulation d1 (N=10)",
       x = "time bins",
       y = "outcome")
p5.3

compare the raw data to the simulated data for a single simulated dataset

p5.4 <- (p3.1 / p5.3) +
   plot_layout(guides = 'collect',
               axes = 'collect')
p5.4

ggsave ("Tutorial_4_planning/figures/raw_vs_d1.jpeg",
        width = 8, height = 8)

ok, this looks reasonably sensible.

6. Think about effect sizes

take a look at avg hazards and hazard ratios at the group level in the original data

p3.1

this is the descriptive data per bin and condition for hazards probability of a response/event, given that it has not happened yet

this is the group level summary data plotted in dat_plot

data_group
## # A tibble: 12 × 3
## # Groups:   timebin [6]
##    timebin prime outcome
##    <fct>   <fct>   <dbl>
##  1 4       con   0.00975
##  2 4       inc   0.0139 
##  3 5       con   0.0478 
##  4 5       inc   0.0423 
##  5 6       con   0.0901 
##  6 6       inc   0.0913 
##  7 7       con   0.188  
##  8 7       inc   0.115  
##  9 8       con   0.252  
## 10 8       inc   0.0495 
## 11 9       con   0.401  
## 12 9       inc   0.0848

calculate the hazard ratio

hazard <- data_group %>% 
  pivot_wider(names_from = "prime",
              values_from = "outcome") %>% 
  mutate(diff = con-inc,
         ratio = inc/con)
hazard
## # A tibble: 6 × 5
## # Groups:   timebin [6]
##   timebin     con    inc     diff ratio
##   <fct>     <dbl>  <dbl>    <dbl> <dbl>
## 1 4       0.00975 0.0139 -0.00416 1.43 
## 2 5       0.0478  0.0423  0.00551 0.885
## 3 6       0.0901  0.0913 -0.00121 1.01 
## 4 7       0.188   0.115   0.0732  0.611
## 5 8       0.252   0.0495  0.203   0.196
## 6 9       0.401   0.0848  0.316   0.211
# timebin     con    inc     diff ratio
#   <fct>     <dbl>  <dbl>    <dbl> <dbl>
# 1 4       0.00975 0.0139 -0.00416 1.43 
# 2 5       0.0478  0.0423  0.00551 0.885
# 3 6       0.0901  0.0913 -0.00121 1.01 
# 4 7       0.188   0.115   0.0732  0.611
# 5 8       0.252   0.0495  0.203   0.196
# 6 9       0.401   0.0848  0.316   0.211

key hazard and hazard ratio values of interest are as follows….

## bin4
## 0.61 - 39% reduction in hazard from con to inc
## 0.115/0.188

## bin5
## 0.20 - 80% reduction in hazard from con to inc
## 0.0495/0.252

## bin6
## 0.21 - 79% reduction in hazard from con to inc
## 0.401/0.0848

Ok, if we wanted to simulate a bunch of effect sizes, we could think about the following.

Let’s just focus on bin 6 to keep things simple. And keep in mind that we do need to think about the absolute hazard values, as well as the ratio.

So con is close to 0.1, so let’s fix it there as a round number and calculate some deviations from it.

Given that the effect of interest was ~80% reduction in Sven’s 2016 paper, let’s play around with a range of effect sizes from e.g., 0.25, 0.5, 0.75. These would translate to a 75% reduction, 50% reduction and 25% reduction from con to inc.

bin6 <- tibble(
  effect_id = 1:3,
  inc = rep(0.1, times=3),
  ratio = c(0.25, 0.5, 0.75),
  con = inc/ratio,
  inc_clog = clogloglink(inc),
  con_clog = clogloglink(con)
)
bin6
## # A tibble: 3 × 6
##   effect_id   inc ratio   con inc_clog con_clog
##       <int> <dbl> <dbl> <dbl>    <dbl>    <dbl>
## 1         1   0.1  0.25 0.4      -2.25   -0.672
## 2         2   0.1  0.5  0.2      -2.25   -1.50 
## 3         3   0.1  0.75 0.133    -2.25   -1.94
#  effect_id   inc ratio   con inc_clog con_clog
#       <int> <dbl> <dbl> <dbl>    <dbl>    <dbl>
# 1         1   0.1  0.25 0.4      -2.25   -0.672
# 2         2   0.1  0.5  0.2      -2.25   -1.50 
# 3         3   0.1  0.75 0.133    -2.25   -1.94 

7. Create a function to iterate through and simulate many datatsets

create a function to simulate data These values are from the initial model and data from Sven’s paper, but they can be modified when running the function, which we do below.

index_sim <- function(subj_n = 10, rep_n = 200, bin_n = 6,  # these can be changed when calling the function
                b1c = -4.66, b2c = -3.24, b3c = -2.32, b4c = -1.49, b5c = -1.21, b6c = -0.6, 
                b1i = -4.37, b2i = -3.21, b3i = -2.37, b4i = -2.27, b5i = -3.04, b6i = -2.45, # fixed effects
                u1c_sd = 1.06, u2c_sd = 1.34, u3c_sd = 0.60, u4c_sd = 0.47, u5c_sd = 0.58, u6c_sd = 0.72, 
                u1i_sd = 1.17, u2i_sd = 1.05, u3i_sd = 0.91, u4i_sd = 1.47, u5i_sd = 1.34, u6i_sd = 0.62, # varying effects
                cors = cor_mat,   # correlations between effects
                ... # helps the function work with pmap() below
                ) {
  # set up data structure
  data <- add_random(subj = subj_n, rep = rep_n) %>%
  add_within("subj", condition = c("cond1", "cond2")) %>%
  add_contrast("condition", "treatment", add_cols = TRUE, 
               colnames = c("cond")) %>%
  add_within("rep", bin = 1:bin_n) %>%
  ### create new conditions here that respect the twelve levels b1c, b1i etc.
  mutate(con = if_else(condition == "cond1", 1, 0),
         inc = if_else(condition == "cond2", 1, 0)) %>% 
  mutate(bin1 = if_else(bin == 1, 1, 0),
         bin2 = if_else(bin == 2, 1, 0),
         bin3 = if_else(bin == 3, 1, 0),
         bin4 = if_else(bin == 4, 1, 0),
         bin5 = if_else(bin == 5, 1, 0),
         bin6 = if_else(bin == 6, 1, 0)) %>% 
  # add random effects 
  add_ranef("subj", u1c = u1c_sd, u2c = u2c_sd, u3c = u3c_sd, u4c = u4c_sd,
            u5c = u5c_sd, u6c = u6c_sd,
            u1i = u1i_sd, u2i = u2i_sd, u3i = u3i_sd, u4i = u4i_sd,
            u5i = u5i_sd, u6i = u6i_sd, 
            .cors = cor_mat) %>% 
  # calculate logit
  mutate(cloglog = (b1c + u1c) * con * bin1 + (b2c + u2c) * con * bin2 + 
           (b3c + u3c) * con * bin3 + (b4c + u4c) * con * bin4  + 
           (b5c + u5c) * con * bin5 + (b6c + u6c) * con * bin6 +
           (b1i + u1i) * inc * bin1 + (b2i + u2i) * inc * bin2 + 
           (b3i + u3i) * inc * bin3 + (b4i + u4i) * inc * bin4  + 
           (b5i + u5i) * inc * bin5 + (b6i + u6i) * inc * bin6) %>% 
  # calculate inverse cloglog
  mutate(prob = clogloglink(cloglog, inverse = TRUE)) %>%  
  # calculate event
  mutate(event = rbinom(n(), 1, prob)) %>% 
  group_by(subj, rep, condition) %>%
  mutate(cumsum = cumsum(event),
         outcomel = event==1 & cumsum(event) < 2,
         outcomen = if_else(event==1 & cumsum(event) == 1, 1,
                     if_else(cumsum(event) >= 1, NA, 0))) %>% 
  drop_na(outcomen) %>%
  ungroup()

  # glimpse(data) # only use this when testing the code
}

Here’s a quick example of how our function works. You can change these parameters and create some different data.

test_sim <- index_sim(subj_n = 10, rep_n = 200) # if you uncomment glimpse above,
# it will let you glimpse the data that's generated. this is useful for checking / testing code purposes.

create a function to summarise the simulated data

index_summary <- function(df) {
  df %>% 
  group_by(subj, condition, bin) %>% 
  summarise(n = n(), 
            sum = sum(outcomen),
            mean_outcome = mean(outcomen, na.rm = TRUE),
            sd_outcome = sd(outcomen, na.rm = TRUE),
            sem = (sd_outcome/sqrt(length(unique(rep)))),
            .groups="drop") 
}

test it

test_summary <- index_summary(test_sim)
head(test_summary)
## # A tibble: 6 × 8
##   subj   condition   bin     n   sum mean_outcome sd_outcome     sem
##   <chr>  <fct>     <int> <int> <dbl>        <dbl>      <dbl>   <dbl>
## 1 subj01 cond1         1   200     3       0.015       0.122 0.00862
## 2 subj01 cond1         2   197    11       0.0558      0.230 0.0164 
## 3 subj01 cond1         3   186    15       0.0806      0.273 0.0200 
## 4 subj01 cond1         4   171    25       0.146       0.354 0.0271 
## 5 subj01 cond1         5   146    52       0.356       0.481 0.0398 
## 6 subj01 cond1         6    94    60       0.638       0.483 0.0498

8. Simulate data for a range of parameters

vary trial count per condition and effect size, keep pid fixed at 10. This will take some time, probably hours…go get a beer/coffee. For the purposes of this tutorial, eval is set to false in the below chunks, to save time.

plan(multicore)
x1 <- crossing(
  exp = 1:1000, # number of experiment replicates
  subj_n = 10, # range of subject N
  rep_n = c(100, 200, 400),
  b6i = -2.25,
  b6c = c(-1.94, -1.50, -0.67)
) %>%
  mutate(d = pmap(., index_sim)) %>% 
  mutate(s = map(d, index_summary)) %>% 
  select(-d)

9. Summarise the simulated data

sx1 <- x1 %>%
  unnest(s) %>%
  mutate(exp=factor(exp),
         subj_n=factor(subj_n),
         rep_n=factor(rep_n),
         b6c = factor(b6c,
                      levels = c("-1.94", "-1.5", "-0.67"),
                      labels = c("25%", "50%", "75%")),
         subj=factor(subj),
         bin=factor(bin))
sx1

take a look

head(sx1)
tail(sx1)

sx1 %>% 
  distinct(b6c)

save out a file

## save the first simulation
write_csv(sx1, "Tutorial_4_planning/data/sim1/sim1_data.csv")

calculate summary data

read in the file, if it is already computed

sx1 <- read_csv("Tutorial_4_planning/data/sim1/sim1_data.csv") %>%
  mutate(exp=factor(exp),
         subj_n=factor(subj_n),
         rep_n=factor(rep_n),
         b6c = factor(b6c,
                      levels = c("-1.94", "-1.5", "-0.67"),
                      labels = c("25%", "50%", "75%")),
         subj=factor(subj),
         bin=factor(bin))
## Rows: 1079143 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (2): subj, condition
## dbl (11): exp, subj_n, rep_n, b6i, b6c, bin, n, sum, mean_outcome, sd_outcom...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
sx1
## # A tibble: 1,079,143 × 13
##    exp   subj_n rep_n   b6i b6c   subj  condition bin       n   sum mean_outcome
##    <fct> <fct>  <fct> <dbl> <fct> <fct> <chr>     <fct> <dbl> <dbl>        <dbl>
##  1 1     10     100   -2.25 25%   subj… cond1     1       100     3       0.03  
##  2 1     10     100   -2.25 25%   subj… cond1     2        97    14       0.144 
##  3 1     10     100   -2.25 25%   subj… cond1     3        83    31       0.373 
##  4 1     10     100   -2.25 25%   subj… cond1     4        52     7       0.135 
##  5 1     10     100   -2.25 25%   subj… cond1     5        45     8       0.178 
##  6 1     10     100   -2.25 25%   subj… cond1     6        37     7       0.189 
##  7 1     10     100   -2.25 25%   subj… cond2     1       100    12       0.12  
##  8 1     10     100   -2.25 25%   subj… cond2     2        88     2       0.0227
##  9 1     10     100   -2.25 25%   subj… cond2     3        86     8       0.0930
## 10 1     10     100   -2.25 25%   subj… cond2     4        78    15       0.192 
## # ℹ 1,079,133 more rows
## # ℹ 2 more variables: sd_outcome <dbl>, sem <dbl>

at the exp level

sx1_exp <- sx1 %>% 
  group_by(exp, subj_n, rep_n, b6c, condition, bin) %>% 
  summarise(exp_mean_outcome = mean(mean_outcome, na.rm = TRUE),
            exp_mean_sum=mean(sum, na.rm = TRUE),
            exp_sum=sum(sum),
            n = length(unique(subj)), 
            sd=sd(mean_outcome, na.rm = TRUE),
            sem = (sd/sqrt(length(unique(subj)))),
            .groups = "drop")
sx1_exp
## # A tibble: 108,000 × 12
##    exp   subj_n rep_n b6c   condition bin   exp_mean_outcome exp_mean_sum
##    <fct> <fct>  <fct> <fct> <chr>     <fct>            <dbl>        <dbl>
##  1 1     10     100   25%   cond1     1               0.013           1.3
##  2 1     10     100   25%   cond1     2               0.0513          5  
##  3 1     10     100   25%   cond1     3               0.174          15.9
##  4 1     10     100   25%   cond1     4               0.207          16.4
##  5 1     10     100   25%   cond1     5               0.249          15.4
##  6 1     10     100   25%   cond1     6               0.146           5.7
##  7 1     10     100   25%   cond2     1               0.023           2.3
##  8 1     10     100   25%   cond2     2               0.0215          2.1
##  9 1     10     100   25%   cond2     3               0.0988          9.5
## 10 1     10     100   25%   cond2     4               0.0876          7.5
## # ℹ 107,990 more rows
## # ℹ 4 more variables: exp_sum <dbl>, n <int>, sd <dbl>, sem <dbl>

at the sim level

sx1_sim <- sx1_exp %>% 
  group_by(subj_n, rep_n, b6c, condition, bin) %>% 
  summarise(sim_mean_outcome = mean(exp_mean_outcome, na.rm = TRUE),
            sim_mean_sum = mean(exp_mean_sum, na.rm = TRUE),
            sim_sum = mean(exp_sum, na.rm = TRUE),
            n = length(unique(exp)), 
            sd=sd(exp_mean_outcome, na.rm = TRUE),
            sem = (sd/sqrt(length(unique(exp)))),
            .groups = "drop")
sx1_sim
## # A tibble: 108 × 11
##    subj_n rep_n b6c   condition bin   sim_mean_outcome sim_mean_sum sim_sum
##    <fct>  <fct> <fct> <chr>     <fct>            <dbl>        <dbl>   <dbl>
##  1 10     100   25%   cond1     1               0.0161         1.61    16.1
##  2 10     100   25%   cond1     2               0.0797         7.81    78.1
##  3 10     100   25%   cond1     3               0.108          9.67    96.7
##  4 10     100   25%   cond1     4               0.217         17.3    173. 
##  5 10     100   25%   cond1     5               0.282         17.8    178. 
##  6 10     100   25%   cond1     6               0.162          7.18    71.8
##  7 10     100   25%   cond2     1               0.0232         2.32    23.2
##  8 10     100   25%   cond2     2               0.0644         6.27    62.7
##  9 10     100   25%   cond2     3               0.123         11.1    111. 
## 10 10     100   25%   cond2     4               0.184         14.3    143. 
## # ℹ 98 more rows
## # ℹ 3 more variables: n <int>, sd <dbl>, sem <dbl>

a quick plot of the data

p9.1 <- ggplot(sx1_sim, 
                aes(x = bin, y = sim_mean_outcome, 
                    colour = condition)) +
  geom_jitter(data=sx1_exp, aes(y = exp_mean_outcome),
              alpha=0.5, width = 0.1, height = 0) +
  geom_line(aes(group = condition)) + 
  geom_point(size=2, colour = "black") +
  geom_errorbar(aes(ymin = sim_mean_outcome-sem*1.96, ymax = sim_mean_outcome+sem*1.96),
                width=.2, colour = "black") +
  scale_colour_brewer(palette = "Dark2") +
  scale_y_continuous(limits = c(0,1), breaks = seq(0,1,0.25)) +
  labs(x="timebin",
       y="outcome") + 
  facet_grid(fct_rev(rep_n)~b6c)
p9.1

ggsave ("Tutorial_4_planning/figures/sim1/sim_violin.jpeg",
        width = 10, height = 8, dpi=800)

ok, look at some values in bin 6

at the exp level

sx1_exp_check <- sx1_exp %>%
  filter(bin %in% c(6)) %>% 
  select(exp, rep_n, b6c, condition, bin, exp_mean_outcome) %>%
  pivot_wider(names_from = "condition",
              values_from = "exp_mean_outcome") %>% 
  group_by(exp, rep_n, b6c, bin) %>% 
  mutate(ratio = cond2/cond1) 
head(sx1_exp_check)
## # A tibble: 6 × 7
## # Groups:   exp, rep_n, b6c, bin [6]
##   exp   rep_n b6c   bin   cond1  cond2 ratio
##   <fct> <fct> <fct> <fct> <dbl>  <dbl> <dbl>
## 1 1     100   25%   6     0.146 0.116  0.797
## 2 1     100   50%   6     0.268 0.118  0.442
## 3 1     100   75%   6     0.454 0.117  0.259
## 4 1     200   25%   6     0.170 0.147  0.862
## 5 1     200   50%   6     0.225 0.106  0.473
## 6 1     200   75%   6     0.375 0.0940 0.250

and at the sim level

sx1_sim_check <- sx1_exp_check %>%
  group_by(rep_n, b6c, bin) %>% 
  summarise(sim_ratio = mean(ratio),
            sim_sd=sd(ratio),
            sim_sem=sim_sd/sqrt(1000),
            .groups = "drop")
head(sx1_sim_check)
## # A tibble: 6 × 6
##   rep_n b6c   bin   sim_ratio sim_sd sim_sem
##   <fct> <fct> <fct>     <dbl>  <dbl>   <dbl>
## 1 100   25%   6         0.756 0.264  0.00834
## 2 100   50%   6         0.527 0.167  0.00529
## 3 100   75%   6         0.282 0.0798 0.00252
## 4 200   25%   6         0.742 0.221  0.00700
## 5 200   50%   6         0.519 0.148  0.00469
## 6 200   75%   6         0.275 0.0724 0.00229

plot ratio values

p9.2 <- ggplot(sx1_sim_check, 
                aes(x = b6c, y = sim_ratio)) +
  geom_point(size=1.5) +
  geom_errorbar(aes(ymin = sim_ratio-sim_sem*1.96, ymax = sim_ratio+sim_sem*1.96)) + 
  geom_hline(yintercept = c(0.25, 0.5, 0.75), colour = "red") +
  scale_colour_brewer(palette = "Dark2") +
  scale_y_continuous(limits = c(0,1), breaks = seq(0,1,0.25)) +
  labs(x="timebin",
       y="outcome") + 
  facet_wrap(~rep_n)
p9.2

10. calculate power / precision

basic idea - calculate the difference score in hazard between con and inc per bin and then report how many lower bound quantiles exclude zero? This of course is not a model-based estimate. But I think it should give a good guide to what we might expect.

calculate difference scores

at the exp level

sx1_diff_exp <- sx1 %>%
  pivot_wider(id_cols = c(exp, subj_n, rep_n, b6c, subj, bin),
              names_from = "condition",
              values_from = "mean_outcome") %>% 
  mutate(diff = cond1 - cond2) %>%
  group_by(exp, subj_n, rep_n, b6c, bin) %>% 
  summarise(exp_mean_diff = mean(diff, na.rm = TRUE),
            exp_sd = sd(diff, na.rm = TRUE),
            n = n(), # n here is the total trials per condition per pid
            exp_sem = (exp_sd/sqrt(n)),
            exp_ci = 1.96*exp_sem,
            exp_dz = exp_mean_diff/exp_sd,
            .groups = "drop") 
head(sx1_diff_exp)
## # A tibble: 6 × 11
##   exp   subj_n rep_n b6c   bin   exp_mean_diff exp_sd     n exp_sem exp_ci
##   <fct> <fct>  <fct> <fct> <fct>         <dbl>  <dbl> <int>   <dbl>  <dbl>
## 1 1     10     100   25%   1           -0.01   0.0406    10  0.0128 0.0251
## 2 1     10     100   25%   2            0.0297 0.0504    10  0.0159 0.0312
## 3 1     10     100   25%   3            0.0755 0.152     10  0.0482 0.0944
## 4 1     10     100   25%   4            0.120  0.147     10  0.0466 0.0914
## 5 1     10     100   25%   5            0.0673 0.262     10  0.0828 0.162 
## 6 1     10     100   25%   6            0.0296 0.124     10  0.0391 0.0767
## # ℹ 1 more variable: exp_dz <dbl>
tail(sx1_diff_exp)
## # A tibble: 6 × 11
##   exp   subj_n rep_n b6c   bin   exp_mean_diff exp_sd     n exp_sem exp_ci
##   <fct> <fct>  <fct> <fct> <fct>         <dbl>  <dbl> <int>   <dbl>  <dbl>
## 1 1000  10     400   75%   1           0.00475 0.0254    10 0.00803 0.0157
## 2 1000  10     400   75%   2          -0.0171  0.119     10 0.0377  0.0739
## 3 1000  10     400   75%   3          -0.0714  0.131     10 0.0413  0.0809
## 4 1000  10     400   75%   4           0.0297  0.200     10 0.0632  0.124 
## 5 1000  10     400   75%   5           0.103   0.248     10 0.0786  0.154 
## 6 1000  10     400   75%   6           0.334   0.187     10 0.0592  0.116 
## # ℹ 1 more variable: exp_dz <dbl>

at the group/sim level

sx1_diff_sim <- sx1_diff_exp %>% 
  group_by(subj_n, rep_n, b6c, bin) %>% 
  summarise(mean_diff = mean(exp_mean_diff, na.rm = TRUE),
            sd = sd(exp_mean_diff, na.rm = TRUE),
            n=n(),
            sem = (sd/sqrt((n))),
            ci = 1.96*sem) 
## `summarise()` has grouped output by 'subj_n', 'rep_n', 'b6c'. You can override
## using the `.groups` argument.
sx1_diff_sim
## # A tibble: 54 × 9
## # Groups:   subj_n, rep_n, b6c [9]
##    subj_n rep_n b6c   bin   mean_diff     sd     n      sem       ci
##    <fct>  <fct> <fct> <fct>     <dbl>  <dbl> <int>    <dbl>    <dbl>
##  1 10     100   25%   1      -0.00711 0.0136  1000 0.000431 0.000845
##  2 10     100   25%   2       0.0153  0.0435  1000 0.00138  0.00270 
##  3 10     100   25%   3      -0.0145  0.0401  1000 0.00127  0.00249 
##  4 10     100   25%   4       0.0326  0.0721  1000 0.00228  0.00447 
##  5 10     100   25%   5       0.190   0.0588  1000 0.00186  0.00365 
##  6 10     100   25%   6       0.0459  0.0463  1000 0.00146  0.00287 
##  7 10     100   50%   1      -0.00897 0.0152  1000 0.000482 0.000945
##  8 10     100   50%   2       0.0155  0.0395  1000 0.00125  0.00245 
##  9 10     100   50%   3      -0.0133  0.0399  1000 0.00126  0.00247 
## 10 10     100   50%   4       0.0331  0.0744  1000 0.00235  0.00461 
## # ℹ 44 more rows

plot

violin

diff in original units

p10.1 <- ggplot(sx1_diff_exp, aes(x=bin, y = exp_mean_diff,
                                colour = bin, fill = bin)) +
   geom_jitter(alpha = 0.5, width = 0.1) +
   geom_violin(alpha = 0.7) +
   geom_point(data = sx1_diff_sim, 
             aes(y = mean_diff), size = 3, position=pd2, colour="black") +
   geom_errorbar(data = sx1_diff_sim,
                aes(y = mean_diff, ymin = mean_diff-ci, ymax = mean_diff+ci),
                width=.2, position=pd2, colour = "black") +
   geom_hline(yintercept = 0, colour = "black", linetype = "dashed") +
   scale_fill_brewer(palette = "Dark2") +
   scale_colour_brewer(palette = "Dark2") +
   theme(legend.position = "none") +
   ylab("mean outcome") +
   ggtitle("difference score (cond1-cond2)") +
   facet_grid(fct_rev(rep_n)~b6c)
p10.1

ggsave("Tutorial_4_planning/figures/sim1/sim_diffs.jpeg",
       width = 10, height = 8, dpi=800)

plot each exp’s difference score and associated 95% interval

create some factors and new variables

sx1_diff_exp_2 <- sx1_diff_exp %>%
  group_by(exp, subj_n, rep_n, b6c, bin) %>% 
  mutate(lower = exp_mean_diff-exp_ci,
         upper = exp_mean_diff+exp_ci,
         above_zero = if_else(lower > 0, "yes", "no"), 
         above_zero = factor(above_zero, levels = c("no", "yes")))
sx1_diff_exp_2
## # A tibble: 54,000 × 14
## # Groups:   exp, subj_n, rep_n, b6c, bin [54,000]
##    exp   subj_n rep_n b6c   bin   exp_mean_diff exp_sd     n exp_sem exp_ci
##    <fct> <fct>  <fct> <fct> <fct>         <dbl>  <dbl> <int>   <dbl>  <dbl>
##  1 1     10     100   25%   1           -0.01   0.0406    10 0.0128  0.0251
##  2 1     10     100   25%   2            0.0297 0.0504    10 0.0159  0.0312
##  3 1     10     100   25%   3            0.0755 0.152     10 0.0482  0.0944
##  4 1     10     100   25%   4            0.120  0.147     10 0.0466  0.0914
##  5 1     10     100   25%   5            0.0673 0.262     10 0.0828  0.162 
##  6 1     10     100   25%   6            0.0296 0.124     10 0.0391  0.0767
##  7 1     10     100   50%   1            0.008  0.0181    10 0.00573 0.0112
##  8 1     10     100   50%   2           -0.0375 0.0487    10 0.0154  0.0302
##  9 1     10     100   50%   3           -0.0220 0.116     10 0.0367  0.0720
## 10 1     10     100   50%   4           -0.115  0.336     10 0.106   0.209 
## # ℹ 53,990 more rows
## # ℹ 4 more variables: exp_dz <dbl>, lower <dbl>, upper <dbl>, above_zero <fct>

calculate “power” in a quick and dirty way based on 95% CI

power_x1 <- sx1_diff_exp_2 %>%
  group_by(subj_n, rep_n, b6c, bin) %>%
  mutate(check = ifelse(lower > 0, 1, 0)) %>%
  summarise(power = mean(check, na.rm = TRUE)) %>% 
  filter(bin %in% c(6))
## `summarise()` has grouped output by 'subj_n', 'rep_n', 'b6c'. You can override
## using the `.groups` argument.
power_x1
## # A tibble: 9 × 5
## # Groups:   subj_n, rep_n, b6c [9]
##   subj_n rep_n b6c   bin   power
##   <fct>  <fct> <fct> <fct> <dbl>
## 1 10     100   25%   6     0.202
## 2 10     100   50%   6     0.608
## 3 10     100   75%   6     0.993
## 4 10     200   25%   6     0.21 
## 5 10     200   50%   6     0.675
## 6 10     200   75%   6     0.999
## 7 10     400   25%   6     0.216
## 8 10     400   50%   6     0.693
## 9 10     400   75%   6     0.998

plot power

p10.2 <- ggplot(power_x1, aes(x = b6c, y=rep_n, fill = power)) +
  geom_tile() +
  geom_text(aes(label = sprintf("%.3f", power)), color = "white", size = 10) +
  scale_fill_viridis_c(limits = c(0, 1)) +
  facet_wrap(~bin) +
  labs(y="rep_n", x="b6c")
p10.2

#
ggsave ("Tutorial_4_planning/figures/sim1/power.jpeg",
        width = 10, height = 6, dpi=800)

join power to the df

sx1_diff_power <- sx1_diff_exp_2 %>%
  filter(bin %in% c(6)) %>% 
  inner_join(power_x1, by = c("subj_n", "rep_n", "b6c", "bin")) %>% 
  mutate(power = round(power * 100, 2)) 
head(sx1_diff_power)
## # A tibble: 6 × 15
## # Groups:   exp, subj_n, rep_n, b6c, bin [6]
##   exp   subj_n rep_n b6c   bin   exp_mean_diff exp_sd     n exp_sem exp_ci
##   <fct> <fct>  <fct> <fct> <fct>         <dbl>  <dbl> <int>   <dbl>  <dbl>
## 1 1     10     100   25%   6            0.0296  0.124    10  0.0391 0.0767
## 2 1     10     100   50%   6            0.150   0.136    10  0.0431 0.0846
## 3 1     10     100   75%   6            0.337   0.224    10  0.0710 0.139 
## 4 1     10     200   25%   6            0.0234  0.166    10  0.0526 0.103 
## 5 1     10     200   50%   6            0.118   0.230    10  0.0727 0.143 
## 6 1     10     200   75%   6            0.281   0.259    10  0.0820 0.161 
## # ℹ 5 more variables: exp_dz <dbl>, lower <dbl>, upper <dbl>, above_zero <fct>,
## #   power <dbl>

plot

p10.3 <- sx1_diff_power %>%
  ggplot(aes(x = exp, y = exp_mean_diff, ymin = lower, ymax = upper)) +
  geom_pointrange(fatten = 1/2, aes(colour=above_zero)) +
  geom_hline(yintercept = 0, colour = "red") +
  scale_colour_manual(values=c("darkgrey","black")) +
  geom_text(aes(x=700, y=-0.35, label = sprintf("%.1f%s", power, "% power")), 
            color = "darkgrey", size = 5) +
  theme(legend.position = "none") +
  labs(x = "sim # (i.e., simulation index)",
       y = "hazard difference") +
  scale_x_discrete(breaks = c(250,500,750,1000)) +
  facet_grid(fct_rev(rep_n)~b6c)
p10.3

ggsave ("Tutorial_4_planning/figures/sim1/bin6_diffs.jpeg",
        width = 10, height = 8, dpi=800)

plot power as a bar plot

p10.4 <- ggplot(power_x1, aes(x=rep_n, y=power,
                           colour = b6c, fill = b6c)) +
  geom_col(alpha = 0.5) +
  geom_hline(yintercept = 0.8, colour = "red", linetype = "dashed") +
  geom_hline(yintercept = 0.9, colour = "black", linetype = "dashed") +
  geom_hline(yintercept = 0.95, colour = "blue", linetype = "dashed") +
  scale_colour_brewer(palette = "Dark2") +
  scale_fill_brewer(palette = "Dark2") +
  theme(legend.position = "none") +
  facet_wrap(~b6c, nrow=1)
p10.4

ggsave ("Tutorial_4_planning/figures/sim1/bin6_power_col.jpeg",
        width = 8, height = 6, dpi=800)

ok, on the basis of this analysis, we might make the following conclusions:

  • effects of con vs inc that are as large as previously reported by Panis et al., 2016 would be easy to detect with 100 trials and 10 pids. And probably less data than that also.

  • Across all effect sizes, there is no benefit of adding more than 200 trials per condition.

  • Within the context of these absolute hazard values, somewhere between a 50% reduction and a 75% reduction would give you 80% power to detect an effect, should it exist. e.g., a 75% reduction is nearly 100% power. A 50% reduction is 70% power. So maybe a 60% reduction would give 80% power, with N=10 and trial =200. We could run more sims to figure that out.

  • Other analyses to run - how many individual pids per exp show a clear effect of con vs inc in bin 6? This would be useful to know if we used the small-N design logic and the individual as the unit of replication.

11. Run some more follow-up sims

Let’s try a 50%, 60% and 70% reduction across N=10,15 and 20.

let’s check effect sizes quickly.

bin6_v2 <- tibble(
  effect_id = 1:3,
  inc = rep(0.1, times=3),
  ratio = c(0.3, 0.4, 0.5),
  con = inc/ratio,
  inc_clog = clogloglink(inc),
  con_clog = clogloglink(con)
)
bin6_v2
## # A tibble: 3 × 6
##   effect_id   inc ratio   con inc_clog con_clog
##       <int> <dbl> <dbl> <dbl>    <dbl>    <dbl>
## 1         1   0.1   0.3 0.333    -2.25   -0.903
## 2         2   0.1   0.4 0.25     -2.25   -1.25 
## 3         3   0.1   0.5 0.2      -2.25   -1.50
# effect_id   inc ratio   con inc_clog con_clog
#       <int> <dbl> <dbl> <dbl>    <dbl>    <dbl>
# 1         1   0.1   0.3 0.333    -2.25   -0.903
# 2         2   0.1   0.4 0.25     -2.25   -1.25 
# 3         3   0.1   0.5 0.2      -2.25   -1.50 

run the sim

vary effect size and pid. Again, as before, this will take some time, probably hours…go get a beer/coffee. And again, eval is set to FALSE, to save time for the tutorial.

plan(multicore)
x2 <- crossing(
  exp = 1:1000, # number of experiment replicates
  subj_n = c(10,15,20), # range of subject N
  rep_n = 200,
  b6i = -2.25,
  b6c = c(-0.90, -1.25, -1.50)
) %>%
  mutate(d = pmap(., index_sim)) %>% 
  mutate(s = map(d, index_summary)) %>% 
  select(-d)

summarise the simulated data

sx2 <- x2 %>%
  unnest(s) %>%
  mutate(exp=factor(exp),
         subj_n=factor(subj_n),
         rep_n=factor(rep_n),
         b6c = factor(b6c,
                      levels = c("-1.5", "-1.25", "-0.9"),
                      labels = c("50%", "60%", "70%")),
         subj=factor(subj),
         bin=factor(bin))
sx2

take a look

head(sx2)
tail(sx2)

sx2 %>% 
  distinct(b6c)

save out a file

## save the first simulation
write_csv(sx2, "Tutorial_4_planning/data/sim2/sim2_data.csv")

calculate summary data

read in the file, if it is already computed

sx2 <- read_csv("Tutorial_4_planning/data/sim2/sim2_data.csv") %>%
  mutate(exp=factor(exp),
         subj_n=factor(subj_n),
         rep_n=factor(rep_n),
         b6c = factor(b6c,
                      levels = c("-1.5", "-1.25", "-0.9"),
                      labels = c("50%", "60%", "70%")),
         subj=factor(subj),
         bin=factor(bin))
## Rows: 1618687 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (2): subj, condition
## dbl (11): exp, subj_n, rep_n, b6i, b6c, bin, n, sum, mean_outcome, sd_outcom...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
sx2
## # A tibble: 1,618,687 × 13
##    exp   subj_n rep_n   b6i b6c   subj  condition bin       n   sum mean_outcome
##    <fct> <fct>  <fct> <dbl> <fct> <fct> <chr>     <fct> <dbl> <dbl>        <dbl>
##  1 1     10     200   -2.25 50%   subj… cond1     1       200     1       0.005 
##  2 1     10     200   -2.25 50%   subj… cond1     2       199     6       0.0302
##  3 1     10     200   -2.25 50%   subj… cond1     3       193     7       0.0363
##  4 1     10     200   -2.25 50%   subj… cond1     4       186    16       0.0860
##  5 1     10     200   -2.25 50%   subj… cond1     5       170    30       0.176 
##  6 1     10     200   -2.25 50%   subj… cond1     6       140    26       0.186 
##  7 1     10     200   -2.25 50%   subj… cond2     1       200     1       0.005 
##  8 1     10     200   -2.25 50%   subj… cond2     2       199     8       0.0402
##  9 1     10     200   -2.25 50%   subj… cond2     3       191    14       0.0733
## 10 1     10     200   -2.25 50%   subj… cond2     4       177     2       0.0113
## # ℹ 1,618,677 more rows
## # ℹ 2 more variables: sd_outcome <dbl>, sem <dbl>

at the exp level

sx2_exp <- sx2 %>% 
  group_by(exp, subj_n, rep_n, b6c, condition, bin) %>% 
  summarise(exp_mean_outcome = mean(mean_outcome, na.rm = TRUE),
            exp_mean_sum=mean(sum, na.rm = TRUE),
            exp_sum=sum(sum),
            n = length(unique(subj)), 
            sd=sd(mean_outcome, na.rm = TRUE),
            sem = (sd/sqrt(length(unique(subj)))),
            .groups = "drop")
sx2_exp
## # A tibble: 108,000 × 12
##    exp   subj_n rep_n b6c   condition bin   exp_mean_outcome exp_mean_sum
##    <fct> <fct>  <fct> <fct> <chr>     <fct>            <dbl>        <dbl>
##  1 1     10     200   50%   cond1     1               0.014           2.8
##  2 1     10     200   50%   cond1     2               0.0449          8.8
##  3 1     10     200   50%   cond1     3               0.0550         10.3
##  4 1     10     200   50%   cond1     4               0.200          35.4
##  5 1     10     200   50%   cond1     5               0.305          43  
##  6 1     10     200   50%   cond1     6               0.306          28.9
##  7 1     10     200   50%   cond2     1               0.013           2.6
##  8 1     10     200   50%   cond2     2               0.0410          8.1
##  9 1     10     200   50%   cond2     3               0.123          22.3
## 10 1     10     200   50%   cond2     4               0.219          31.7
## # ℹ 107,990 more rows
## # ℹ 4 more variables: exp_sum <dbl>, n <int>, sd <dbl>, sem <dbl>

at the sim level

sx2_sim <- sx2_exp %>% 
  group_by(subj_n, rep_n, b6c, condition, bin) %>% 
  summarise(sim_mean_outcome = mean(exp_mean_outcome, na.rm = TRUE),
            sim_mean_sum = mean(exp_mean_sum, na.rm = TRUE),
            sim_sum = mean(exp_sum, na.rm = TRUE),
            n = length(unique(exp)), 
            sd=sd(exp_mean_outcome, na.rm = TRUE),
            sem = (sd/sqrt(length(unique(exp)))),
            .groups = "drop")
sx2_sim
## # A tibble: 108 × 11
##    subj_n rep_n b6c   condition bin   sim_mean_outcome sim_mean_sum sim_sum
##    <fct>  <fct> <fct> <chr>     <fct>            <dbl>        <dbl>   <dbl>
##  1 10     200   50%   cond1     1               0.0162         3.25    32.5
##  2 10     200   50%   cond1     2               0.0776        15.2    152. 
##  3 10     200   50%   cond1     3               0.108         19.4    194. 
##  4 10     200   50%   cond1     4               0.218         35.0    350. 
##  5 10     200   50%   cond1     5               0.285         36.0    360. 
##  6 10     200   50%   cond1     6               0.236         20.9    209. 
##  7 10     200   50%   cond2     1               0.0235         4.70    47.0
##  8 10     200   50%   cond2     2               0.0650        12.6    126. 
##  9 10     200   50%   cond2     3               0.124         22.3    223. 
## 10 10     200   50%   cond2     4               0.186         28.8    288. 
## # ℹ 98 more rows
## # ℹ 3 more variables: n <int>, sd <dbl>, sem <dbl>

a quick plot of the data

p11.1 <- ggplot(sx2_sim, 
                aes(x = bin, y = sim_mean_outcome, 
                    colour = condition)) +
  geom_jitter(data=sx2_exp, aes(y = exp_mean_outcome),
              alpha=0.5, width = 0.1, height = 0) +
  geom_line(aes(group = condition)) + 
  geom_point(size=2, colour = "black") +
  geom_errorbar(aes(ymin = sim_mean_outcome-sem*1.96, ymax = sim_mean_outcome+sem*1.96),
                width=.2, colour = "black") +
  scale_colour_brewer(palette = "Dark2") +
  scale_y_continuous(limits = c(0,1), breaks = seq(0,1,0.25)) +
  labs(x="timebin",
       y="outcome") + 
  facet_grid(fct_rev(subj_n)~b6c)
p11.1

ggsave ("Tutorial_4_planning/figures/sim2/sim_violin.jpeg",
        width = 10, height = 8, dpi=800)

ok, look at some values in bin 6

at the exp level

sx2_exp_check <- sx2_exp %>%
  filter(bin %in% c(6)) %>% 
  select(exp, subj_n, b6c, condition, bin, exp_mean_outcome) %>%
  pivot_wider(names_from = "condition",
              values_from = "exp_mean_outcome") %>% 
  group_by(exp, subj_n, b6c, bin) %>% 
  mutate(ratio = cond2/cond1) 
head(sx2_exp_check)
## # A tibble: 6 × 7
## # Groups:   exp, subj_n, b6c, bin [6]
##   exp   subj_n b6c   bin   cond1 cond2 ratio
##   <fct> <fct>  <fct> <fct> <dbl> <dbl> <dbl>
## 1 1     10     50%   6     0.306 0.143 0.466
## 2 1     10     60%   6     0.295 0.119 0.403
## 3 1     10     70%   6     0.399 0.104 0.261
## 4 1     15     50%   6     0.185 0.113 0.613
## 5 1     15     60%   6     0.234 0.129 0.550
## 6 1     15     70%   6     0.327 0.123 0.376

and at the sim level

sx2_sim_check <- sx2_exp_check %>%
  group_by(subj_n, b6c, bin) %>% 
  summarise(sim_ratio = mean(ratio),
            sim_sd=sd(ratio),
            sim_sem=sim_sd/sqrt(1000),
            .groups = "drop")
head(sx2_sim_check)
## # A tibble: 6 × 6
##   subj_n b6c   bin   sim_ratio sim_sd sim_sem
##   <fct>  <fct> <fct>     <dbl>  <dbl>   <dbl>
## 1 10     50%   6         0.517 0.142  0.00450
## 2 10     60%   6         0.429 0.118  0.00373
## 3 10     70%   6         0.328 0.0888 0.00281
## 4 15     50%   6         0.513 0.125  0.00395
## 5 15     60%   6         0.417 0.0968 0.00306
## 6 15     70%   6         0.327 0.0722 0.00228

plot ratio values

p11.2 <- ggplot(sx2_sim_check, 
                aes(x = b6c, y = sim_ratio)) +
  geom_point(size=1.5) +
  geom_errorbar(aes(ymin = sim_ratio-sim_sem*1.96, ymax = sim_ratio+sim_sem*1.96)) + 
  geom_hline(yintercept = c(0.3, 0.4, 0.5), colour = "red") +
  scale_colour_brewer(palette = "Dark2") +
  scale_y_continuous(limits = c(0,1), breaks = seq(0,1,0.25)) +
  labs(x="timebin",
       y="outcome") + 
  facet_wrap(~subj_n)
p11.2

calculate power / precision

basic idea - calculate the difference score in hazard between con and inc per bin and then report how many lower bound quantiles exclude zero? This of course is not a model-based estimate. But I think it should give a good guide to what we might expect.

calculate difference scores

at the exp level

sx2_diff_exp <- sx2 %>%
  pivot_wider(id_cols = c(exp, subj_n, rep_n, b6c, subj, bin),
              names_from = "condition",
              values_from = "mean_outcome") %>% 
  mutate(diff = cond1 - cond2) %>%
  group_by(exp, subj_n, rep_n, b6c, bin) %>% 
  summarise(exp_mean_diff = mean(diff, na.rm = TRUE),
            exp_sd = sd(diff, na.rm = TRUE),
            n = n(), # n here is the total trials per condition per pid
            exp_sem = (exp_sd/sqrt(n)),
            exp_ci = 1.96*exp_sem,
            exp_dz = exp_mean_diff/exp_sd,
            .groups = "drop") 
head(sx2_diff_exp)
## # A tibble: 6 × 11
##   exp   subj_n rep_n b6c   bin   exp_mean_diff exp_sd     n exp_sem exp_ci
##   <fct> <fct>  <fct> <fct> <fct>         <dbl>  <dbl> <int>   <dbl>  <dbl>
## 1 1     10     200   50%   1           0.00100 0.0165    10 0.00521 0.0102
## 2 1     10     200   50%   2           0.00386 0.0777    10 0.0246  0.0482
## 3 1     10     200   50%   3          -0.0676  0.108     10 0.0342  0.0671
## 4 1     10     200   50%   4          -0.0189  0.258     10 0.0816  0.160 
## 5 1     10     200   50%   5           0.221   0.184     10 0.0580  0.114 
## 6 1     10     200   50%   6           0.163   0.176     10 0.0558  0.109 
## # ℹ 1 more variable: exp_dz <dbl>
tail(sx2_diff_exp)
## # A tibble: 6 × 11
##   exp   subj_n rep_n b6c   bin   exp_mean_diff exp_sd     n exp_sem exp_ci
##   <fct> <fct>  <fct> <fct> <fct>         <dbl>  <dbl> <int>   <dbl>  <dbl>
## 1 1000  20     200   70%   1          -0.00675 0.0369    20 0.00825 0.0162
## 2 1000  20     200   70%   2          -0.0174  0.0696    20 0.0156  0.0305
## 3 1000  20     200   70%   3           0.0574  0.0611    20 0.0137  0.0268
## 4 1000  20     200   70%   4           0.0291  0.231     20 0.0517  0.101 
## 5 1000  20     200   70%   5           0.180   0.131     20 0.0294  0.0576
## 6 1000  20     200   70%   6           0.277   0.184     20 0.0413  0.0809
## # ℹ 1 more variable: exp_dz <dbl>

at the group/sim level

sx2_diff_sim <- sx2_diff_exp %>% 
  group_by(subj_n, rep_n, b6c, bin) %>% 
  summarise(mean_diff = mean(exp_mean_diff, na.rm = TRUE),
            sd = sd(exp_mean_diff, na.rm = TRUE),
            n=n(),
            sem = (sd/sqrt((n))),
            ci = 1.96*sem) 
## `summarise()` has grouped output by 'subj_n', 'rep_n', 'b6c'. You can override
## using the `.groups` argument.
sx2_diff_sim
## # A tibble: 54 × 9
## # Groups:   subj_n, rep_n, b6c [9]
##    subj_n rep_n b6c   bin   mean_diff     sd     n      sem       ci
##    <fct>  <fct> <fct> <fct>     <dbl>  <dbl> <int>    <dbl>    <dbl>
##  1 10     200   50%   1      -0.00728 0.0125  1000 0.000394 0.000772
##  2 10     200   50%   2       0.0126  0.0408  1000 0.00129  0.00253 
##  3 10     200   50%   3      -0.0157  0.0386  1000 0.00122  0.00239 
##  4 10     200   50%   4       0.0320  0.0734  1000 0.00232  0.00455 
##  5 10     200   50%   5       0.192   0.0572  1000 0.00181  0.00355 
##  6 10     200   50%   6       0.118   0.0496  1000 0.00157  0.00308 
##  7 10     200   60%   1      -0.00780 0.0139  1000 0.000439 0.000861
##  8 10     200   60%   2       0.0138  0.0391  1000 0.00124  0.00242 
##  9 10     200   60%   3      -0.0147  0.0388  1000 0.00123  0.00240 
## 10 10     200   60%   4       0.0335  0.0685  1000 0.00217  0.00424 
## # ℹ 44 more rows

plot

violin

diff in original units

p11.3 <- ggplot(sx2_diff_exp, aes(x=bin, y = exp_mean_diff,
                                colour = bin, fill = bin)) +
   geom_jitter(alpha = 0.5, width = 0.1) +
   geom_violin(alpha = 0.7) +
   geom_point(data = sx2_diff_sim, 
             aes(y = mean_diff), size = 3, position=pd2, colour="black") +
   geom_errorbar(data = sx2_diff_sim,
                aes(y = mean_diff, ymin = mean_diff-ci, ymax = mean_diff+ci),
                width=.2, position=pd2, colour = "black") +
   geom_hline(yintercept = 0, colour = "black", linetype = "dashed") +
   scale_fill_brewer(palette = "Dark2") +
   scale_colour_brewer(palette = "Dark2") +
   theme(legend.position = "none") +
   ylab("mean outcome") +
   ggtitle("difference score (cond1-cond2)") +
   facet_grid(fct_rev(subj_n)~b6c)
p11.3

ggsave("Tutorial_4_planning/figures/sim2/sim_diffs.jpeg",
       width = 10, height = 8, dpi=800)

plot each exp’s difference score and associated 95% interval

create some factors and new variables

sx2_diff_exp_2 <- sx2_diff_exp %>%
  group_by(exp, subj_n, rep_n, b6c, bin) %>% 
  mutate(lower = exp_mean_diff-exp_ci,
         upper = exp_mean_diff+exp_ci,
         above_zero = if_else(lower > 0, "yes", "no"), 
         above_zero = factor(above_zero, levels = c("no", "yes")))
sx2_diff_exp_2
## # A tibble: 54,000 × 14
## # Groups:   exp, subj_n, rep_n, b6c, bin [54,000]
##    exp   subj_n rep_n b6c   bin   exp_mean_diff exp_sd     n exp_sem  exp_ci
##    <fct> <fct>  <fct> <fct> <fct>         <dbl>  <dbl> <int>   <dbl>   <dbl>
##  1 1     10     200   50%   1           0.00100 0.0165    10 0.00521 0.0102 
##  2 1     10     200   50%   2           0.00386 0.0777    10 0.0246  0.0482 
##  3 1     10     200   50%   3          -0.0676  0.108     10 0.0342  0.0671 
##  4 1     10     200   50%   4          -0.0189  0.258     10 0.0816  0.160  
##  5 1     10     200   50%   5           0.221   0.184     10 0.0580  0.114  
##  6 1     10     200   50%   6           0.163   0.176     10 0.0558  0.109  
##  7 1     10     200   60%   1          -0.006   0.0156    10 0.00493 0.00967
##  8 1     10     200   60%   2          -0.0135  0.0560    10 0.0177  0.0347 
##  9 1     10     200   60%   3           0.0214  0.0878    10 0.0278  0.0544 
## 10 1     10     200   60%   4          -0.00657 0.220     10 0.0696  0.136  
## # ℹ 53,990 more rows
## # ℹ 4 more variables: exp_dz <dbl>, lower <dbl>, upper <dbl>, above_zero <fct>

calculate “power” in a quick and dirty way based on 95% CI

power_x2 <- sx2_diff_exp_2 %>%
  group_by(subj_n, rep_n, b6c, bin) %>%
  mutate(check = ifelse(lower > 0, 1, 0)) %>%
  summarise(power = mean(check, na.rm = TRUE)) %>% 
  filter(bin %in% c(6))
## `summarise()` has grouped output by 'subj_n', 'rep_n', 'b6c'. You can override
## using the `.groups` argument.
power_x2
## # A tibble: 9 × 5
## # Groups:   subj_n, rep_n, b6c [9]
##   subj_n rep_n b6c   bin   power
##   <fct>  <fct> <fct> <fct> <dbl>
## 1 10     200   50%   6     0.682
## 2 10     200   60%   6     0.869
## 3 10     200   70%   6     0.983
## 4 15     200   50%   6     0.831
## 5 15     200   60%   6     0.956
## 6 15     200   70%   6     0.997
## 7 20     200   50%   6     0.932
## 8 20     200   60%   6     0.986
## 9 20     200   70%   6     0.999

plot power

p11.4 <- ggplot(power_x2, aes(x = b6c, y=subj_n, fill = power)) +
  geom_tile() +
  geom_text(aes(label = sprintf("%.3f", power)), color = "white", size = 10) +
  scale_fill_viridis_c(limits = c(0, 1)) +
  facet_wrap(~bin) +
  labs(y="subj_n", x="b6c")
p11.4

#
ggsave ("Tutorial_4_planning/figures/sim2/power.jpeg",
        width = 10, height = 6, dpi=800)

join power to the df

sx2_diff_power <- sx2_diff_exp_2 %>%
  filter(bin %in% c(6)) %>% 
  inner_join(power_x2, by = c("subj_n", "rep_n", "b6c", "bin")) %>% 
  mutate(power = round(power * 100, 2)) 
head(sx2_diff_power)
## # A tibble: 6 × 15
## # Groups:   exp, subj_n, rep_n, b6c, bin [6]
##   exp   subj_n rep_n b6c   bin   exp_mean_diff exp_sd     n exp_sem exp_ci
##   <fct> <fct>  <fct> <fct> <fct>         <dbl>  <dbl> <int>   <dbl>  <dbl>
## 1 1     10     200   50%   6            0.163  0.176     10  0.0558 0.109 
## 2 1     10     200   60%   6            0.176  0.138     10  0.0435 0.0853
## 3 1     10     200   70%   6            0.295  0.221     10  0.0698 0.137 
## 4 1     15     200   50%   6            0.0716 0.0938    15  0.0242 0.0475
## 5 1     15     200   60%   6            0.106  0.135     15  0.0348 0.0683
## 6 1     15     200   70%   6            0.204  0.155     15  0.0401 0.0785
## # ℹ 5 more variables: exp_dz <dbl>, lower <dbl>, upper <dbl>, above_zero <fct>,
## #   power <dbl>

plot

p11.5 <- sx2_diff_power %>%
  ggplot(aes(x = exp, y = exp_mean_diff, ymin = lower, ymax = upper)) +
  geom_pointrange(fatten = 1/2, aes(colour=above_zero)) +
  geom_hline(yintercept = 0, colour = "red") +
  scale_colour_manual(values=c("darkgrey","black")) +
  geom_text(aes(x=700, y=-0.35, label = sprintf("%.1f%s", power, "% power")), 
            color = "darkgrey", size = 5) +
  theme(legend.position = "none") +
  labs(x = "sim # (i.e., simulation index)",
       y = "hazard difference") +
  scale_x_discrete(breaks = c(250,500,750,1000)) +
  facet_grid(fct_rev(subj_n)~b6c)
p11.5

ggsave ("Tutorial_4_planning/figures/sim2/bin6_diffs.jpeg",
        width = 10, height = 8, dpi=800)

plot power as a bar plot

p11.6 <- ggplot(power_x2, aes(x=subj_n, y=power,
                           colour = b6c, fill = b6c)) +
  geom_col(alpha = 0.5) +
  geom_hline(yintercept = 0.8, colour = "red", linetype = "dashed") +
  geom_hline(yintercept = 0.9, colour = "black", linetype = "dashed") +
  geom_hline(yintercept = 0.95, colour = "blue", linetype = "dashed") +
  scale_colour_brewer(palette = "Dark2") +
  scale_fill_brewer(palette = "Dark2") +
  theme(legend.position = "none") +
  facet_wrap(~b6c, nrow=1)
p11.6

ggsave ("Tutorial_4_planning/figures/sim2/bin6_power_col.jpeg",
        width = 8, height = 6, dpi=800)

A summary of these power calculations might be as follows:

For a 70% reduction (0.3 hazard ratio), N=10, trial=200 per condition, would give nearly 100% power. For a 60% reduction (0.4 hazard ratio), N=10, trial=200 per condition, would give nearly 90% power. For a 50% reduction (0.5 hazard ratio), N=15, trial=200 per condition, would give over 80% power.

And a conclusion / judgment / decision process might be:

Well, like almost always, it depends on your objectives. Some considerations might be…

How much power or precision are you looking to obtain in this particular study? Are you running multiple studies that have some form of replication built in? What resources do you have at your disposal, such as time, money and personnel? How easy or difficult is it to obtain the specific type of sample?

My thoughts for studies in my lab might be something like this…

Pick 0.4 or 0.5 as a target effect size since this is much smaller than that observed in published studies, then pick the corresponding N value (i.e., N=10 or N=15) that takes you over the 80% power mark.

But, and this is an important “but”, do not solely rely on one study. Run a follow-up experiment that replicates and extends the initial result. By doing so, you avoid the Cult of the Isolated Single Study, and it reduces the reliance on any one type of power analysis. Instead, you are aiming for common patterns across two or more experiments, rather than trying to make the case that a single study has sufficient evidential value to hit some criterion mark.